Pages

Friday 29 March 2019

Visualising SQL Analytics Rolling Count with OracleJET in APEX

Back in around 2005, before the time of smart phones, I had some data.

I can't remember what the data was, but I was told that for it to be valid, it should roughly form a bell curve.

Sure, I'm sure I could have aggregated it, exported it to Excel, and plotted it to a graph, but this was 2005. There was no SQL Developer where I could copy & paste the results directly into Excel. No panel in Excel with a button that plots that data in one click. Back in my day, thing like that were a little bit more work ;p

This was a time when I enjoyed SQL*Plus, and I ended up writing a query to nest STDDEV within RPAD so it would plot asterisks down the page, in a vertical bell. In was a work of art. I'm sure I've still got the test case labelled in bowels of Gmail.

Fast forward to today, where I have data I know starts as a daily bell curve, but I'm applying a rolling 24 hour total over that daily resonance. How does that affect the wave?

I use today's fancy tools to plot the said data, and it doesn't turn out quite how I pictured.

This is my query below plotting real data using OracleJET charts in Oracle APEX.

The underlying data still formed a bell, more detail in a future post, but smoothing this out to a rolling 24 hour total - what should that look like? Why is there a flat line during no activity, instead of a dip?
No doubt there is a math-geek corner of the internet that explores this pattern, but I still wasn't confident with the query.

Real data is awesome. The trouble is, it can be real noisy. If you're trying to reconcile your results, it's often worth building a simple test case. As a regular forum participant, I'm such an advocate for this. If the person asking the question took the time to write the DDL for the test case, I reckon half the questions wouldn't need asking. The other half will have an executable test case ready for any responders.
How's this - in a response to a question I asked (with a test case, of course), Andrew Sayer responded with a LiveSQL demonstration! AskTom questions that have a livesql.oracle.com example are generally prioritised.

So, what I have here is a real simple table, with some "logs" at specific times, and a little variance. It's quite similar to the setup you might see for SQL exercises in the Oracle Dev Gym.
-- create a simple table of dates
create table range_eg  (rd date);
-- ready for adjustment
truncate table range_eg;
-- insert sample hits at a regular interval, with more frequency around noon.
insert into range_eg values (timestamp '2019-03-26 09:30:00');
insert into range_eg values (timestamp '2019-03-26 10:30:00');
insert into range_eg values (timestamp '2019-03-26 10:45:00');
insert into range_eg values (timestamp '2019-03-26 11:30:00');
insert into range_eg values (timestamp '2019-03-26 11:45:00');
insert into range_eg values (timestamp '2019-03-26 12:30:00');
insert into range_eg values (timestamp '2019-03-26 12:45:00');
insert into range_eg values (timestamp '2019-03-26 13:30:00');
insert into range_eg values (timestamp '2019-03-26 13:45:00');
insert into range_eg values (timestamp '2019-03-26 14:30:00');
insert into range_eg values (timestamp '2019-03-26 15:30:00');
insert into range_eg values (timestamp '2019-03-26 16:30:00');

-- I took a few hits out the middle, so the bell would not be quite the same shape
insert into range_eg values (timestamp '2019-03-27 09:30:00');
insert into range_eg values (timestamp '2019-03-27 10:30:00');
insert into range_eg values (timestamp '2019-03-27 10:45:00');
insert into range_eg values (timestamp '2019-03-27 11:30:00');
--insert into range_eg values (timestamp '2019-03-27 11:45:00');
--insert into range_eg values (timestamp '2019-03-27 12:30:00');
--insert into range_eg values (timestamp '2019-03-27 12:45:00');
insert into range_eg values (timestamp '2019-03-27 13:30:00');
insert into range_eg values (timestamp '2019-03-27 13:45:00');
insert into range_eg values (timestamp '2019-03-27 14:30:00');
insert into range_eg values (timestamp '2019-03-27 15:30:00');
insert into range_eg values (timestamp '2019-03-27 16:30:00');

-- I added a small day at the end, to see how it may affect rolling total drift to zero
insert into range_eg values (timestamp '2019-03-28 11:30:00');
insert into range_eg values (timestamp '2019-03-28 12:30:00');
insert into range_eg values (timestamp '2019-03-28 13:30:00');
Then used an analytical function to cumulate the running count over a 1 day interval.
The second column, with accompanying union, was an attempt to see what happened if I filled out each hour with a record, to see if that influenced the overnight dip in my real data. It also let me see the cumulative count taper off as the data ended.
select * from (
    select rd 
     ,sum(sum(c)) over ( order by rd  
         range between NUMTODSINTERVAL(1,'day') preceding 
                   and current row)  last_24hr_running
     ,sum(count(*)) over ( order by rd  
         range between NUMTODSINTERVAL(1,'day') preceding 
                   and current row)  last_24hr_cnt
     from 
      (select 1 c,rd from range_eg 
      union all
      -- I even tried to pad out records to see 
      -- if the line would vary during the lulls
      select null c, trunc(sysdate,'hh24')+1-rownum/24
      from dual
      connect by level< 24*3
      )
    group by (rd) 
-- must ignore first day in inline query
-- as cumulative range will be skewed
) --where request_date > sysdate-2 
order by 2;

Data checks out, SQL checks out - does the graph check out?

Test case plotted

I think so, but given the context of the data, I don't think this line graph is appropriate, particularly around the period of no activity. More on that in the next post.

2 comments:

  1. Hey Scott, this isn't directly related to this post but I did see a separate tweet you posted re missing help text for the 'Connect Null Data Points' attribute. What's interesting is that we've had item-level help text for that attribute since it was introduced - however, I'm only seeing now that it isn't being displayed in Page Designer, for some strange reason. So thank you for bringing this to our attention. We'll endeavour to resolve this for 19.2. So you're aware, the item-level help text should read as follows:

    "Specify whether null data points on a multi-series line chart should render as a continuous or broken line.

    When set to "Yes", null data points are handled as zero, to render a continuous line. When set to "No", the series will be rendered as a broken line, with gaps representing null data points.

    This setting will be applied to multi-series chart types that support rendering as a line, such as Line, Line with Area, Combination, Area, Polar, Radar and Scatter. The default is "Yes"."

    Regards,
    Hilary
    APEX Development Team

    ReplyDelete
  2. Thanks for the information, Hilary.
    I figured it would be something like that, but I don't think I experienced that behaviour - though I didn't spend much time experimenting. No doubt you'll here from me on the forums if I have questions ;p

    ReplyDelete