## Monday 10 August 2009

### Subquery Factoring Clause - WITH

Through general experience I've found the `WITH` statement either underutilised or people haven't been aware of it's existence, although that seems to be changing - it has been around since 9i.

For me, two typical examples come to mind.
1) I was demonstrating recently an example of using `SUBSTR` & `INSTR` together to extract certain parts of a string. I built up the expressions step by step, then to test the example with a slightly different string, I would normally be forced to do a search and replace. Depending on the example, this can be annoying.
`select 'halls head 6065'      ,substr('halls head 6065',4,5) guess      ,instr('halls head 6065',' ',-1,1) pos_of_space      ,substr('halls head 6065',1,instr('halls head 6065',' ',-1,1)-1 )||'*' suburb      ,'*'||substr('halls head 6065',instr('halls head 6065',' ',-1,1)+1 )||'*' postcodefrom dual;`
Instead I can put the string I want to play with in a simple `WITH` statement, and change the "column name" at will.
`with test as (select 'halls head 6065' suburb from dual)select suburb      ,substr(suburb,4,5) guess      ,instr(suburb,' ',-1,1) pos_of_space      ,substr(suburb,1,instr(suburb,' ',-1,1)-1 )||'*' suburb      ,'*'||substr(suburb,instr(suburb,' ',-1,1)+1 )||'*' postcodefrom test;`
Other times recently I have used the `WITH` statement to essentially encapsulate a unit of work. I had extracted a SQL statement from some PL/SQL and wanted to run it with my own restrictions, but leave the original untouched - this allowed me to test without worrying if I've disrupted the original join behaviour.
`WITH original as ({copy of embedded sql})select *from originalwhere {my own restrictions}`
These are ad hoc reasons, there is also a potential performance benefit from using the `WITH` clause.

I'm sure I've seen an example where using the `WITH` clause provided a performance benefit, but memory is fallible so I thought I'd see what the documentation had to say.
It says it will give the optimiser more choice about what to do, and that we can improve the query by using the `WITH` syntax.
After comparing explain plans from the example straight out of the documentation, I wasn't convinced. So I ran timings over different versions of the database - 1000 iterations each. I was surprised at the results (sounds like tabloid journalism, doesn't it?):

Without the clause
`SELECT dname, SUM(sal) AS dept_totalFROM emp, deptWHERE emp.deptno = dept.deptnoGROUP BY dname HAVINGSUM(sal) >  (    SELECT SUM(sal) * 1/3    FROM emp, dept    WHERE emp.deptno = dept.deptno  )ORDER BY SUM(sal) DESC;`

With the clause
`WITH summary AS( SELECT dname, SUM(sal) AS dept_total FROM emp, dept WHERE emp.deptno = dept.deptno GROUP BY dname)SELECT dname, dept_totalFROM summaryWHERE dept_total >  (    SELECT SUM(dept_total) * 1/3    FROM summary  )ORDER BY dept_total DESC;`
`--9i.3 secs.0003 secs per iteration2.9 secs.0029 secs per iteration--10g1.1 secs.0011 secs per iteration4.06 secs.00406 secs per iteration--11g1.17 secs.00117 secs per iteration7.46 secs.00746 secs per iteration`
Not only has it not performed better, its comparative performance has deteriorated as versions go up. So while we have removed duplicate code and enhanced the readability of our code, this example shows it's not necessarily more efficient.

It may just be a poor example to illustrate the performance benefit of this feature. It shows while a documented effect may be true in some circumstances, you must always test your particular scenario on your framework to get accurate comparisons. For further comment on this paradigm, I highly recommend reading this. Twice.

oraclenerd said...

Scott,

I like and use the WITH clause frequently.

I remember trying to search for it in the beginning (on 9i, ~2004)...I learned quickly that you can't search on "with"

Scott Wesley said...

Yeah, there are a few features with keywords that are difficult to search for! It's partially what compelled me to write a presentation on Documentation.

I know in a few past jobs I've used it to good efficiency, I just couldn't find a good test scenario.

oraclenerd said...

I just remembered a couple of things.

1. I used this in an ETL process to great benefit. The SELECT in the WITH clause was against a small table (maybe 30 or 40 rows) and the bulk was against a table with 40 to 50 million rows.

2. Sometimes I'll add the WITH statement just to show off (not really...kind of) to the non SQL types. I did it recently and performance was not improved what-so-ever. In fact, I saw numbers like you mentioned. In that particular instance there was a bigger gain in another part of the code (removed the cursor loop DELETE).

Scott Wesley said...

It's good when you can find a solution for something that while in itself isn't necessarily the best, its "side effects" improve the bigger picture.

You also reminded me of when the first time I saw WITH, it reminded me of the Pascal WITH syntax where it just saved you typing the record/array details over and over.

Oh... do I miss Pascal?