Wednesday 8 August 2012

Fun with SQL analytical functions

I had an interesting SQL problem at work recently, and I came up with a solution that I'm not sure is completely ideal - so I thought I'd attempt to replicate it here.

Some of the complexity is lost while I obscure and simplify the problem (it involved a hierarchical query), but I think the key elements remain.

Consider a table of codes with an order sequence.
create table my_values (code varchar2(10), order_seq number(5));
insert into my_values values ('A', 10);
insert into my_values values ('A', 20);
insert into my_values values ('A', 30);
insert into my_values values ('B', 40);
insert into my_values values ('B', 50);
insert into my_values values ('C', 60);
insert into my_values values ('C', 70);
insert into my_values values ('C', 80);
insert into my_values values ('D', 90);
My requirement was to see the next and previous code. For instance, when listing B records, I wanted to see A and C in the same row - the solution screamed analytical functions so I started with my favourite:
select code, order_seq
      ,row_number() over (partition by code order by order_seq) rn
from my_values
order by order_seq;

CODE       ORDER_SEQ RN
---------- --------- --
A                 10  1 
A                 20  2 
A                 30  3 
B                 40  1 
B                 50  2 
C                 60  1 
C                 70  2 
C                 80  3 
D                 90  1
I struck the results down to just the first row for each code, incorporating lag/lead to get the info I needed.
select s.code
      ,lag(s.code) over (order by s.order_seq) my_lag
      ,lead(s.code) over (order by s.order_seq) my_lead
from 
(
  select code, order_seq
        ,row_number() over (partition by code 
                            order by order_seq) rn
  from my_values
  order by order_seq
) s
where s.rn = 1
order by order_seq;

CODE       MY_LAG     MY_LEAD  
---------- ---------- ----------
A                     B          
B          A          C          
C          B          D          
D          C                    
Finally, to combine my results I created an in-line view with a subquery factoring statement.
with sub as (
  select s.code
        ,lag(s.code) over (order by s.order_seq) my_lag
        ,lead(s.code) over (order by s.order_seq) my_lead
  from 
  (
    select code, order_seq
          ,row_number() over (partition by code 
                              order by order_seq) rn
    from my_values
    order by order_seq
  ) s
  where s.rn = 1)
select m.*, sub.my_lag, sub.my_lead
from my_values m, sub
where m.code = sub.code;

CODE       ORDER_SEQ MY_LAG     MY_LEAD  
---------- --------- ---------- ----------
A                 10            B          
A                 20            B          
A                 30            B          
B                 40 A          C          
B                 50 A          C          
C                 60 B          D          
C                 70 B          D          
C                 80 B          D          
D                 90 C                     
So my question is - any ideas for a simpler solution?

2 comments:

Maxim said...

Not really simpler, just another way

select
code
,order_seq
,last_value(preceding ignore nulls) over(order by order_seq) preceding
,last_value(following ignore nulls) over(order by order_seq desc) following
from (
select t.*
, nullif(lead(code) over(order by order_seq),code) following
, nullif(lag(code) over(order by order_seq),code) preceding
from my_values t
) t
order by order_seq

Best regards

Maxim

Timo Raitalaakso said...

The Maxims solution may be improved to do only two window sorts:

select
code
,order_seq
,last_value(preceding ignore nulls) over(order by order_seq) preceding
,first_value(following ignore nulls) over(order by order_seq rows between current row and unbounded following) following
from (
select t.*
, nullif(lead(code) over(order by order_seq),code) following
, nullif(lag(code) over(order by order_seq),code) preceding
from my_values t
) t
order by order_seq
;

And with a dirty interpretation of the test data it is possible with only one window sort:

select t.*
, nullif(first_value(code) over(order by ascii(code) range 1 preceding),code) pr
, nullif(last_value(code) over(order by ascii(code) range between current row and 1 following),code) fo
from my_values t
;