Monday 10 August 2009

Subquery Factoring Clause - WITH

Through general experience I've found the WITH statement either underutilised or people haven't been aware of it's existence, although that seems to be changing - it has been around since 9i.

For me, two typical examples come to mind.
1) I was demonstrating recently an example of using SUBSTR & INSTR together to extract certain parts of a string. I built up the expressions step by step, then to test the example with a slightly different string, I would normally be forced to do a search and replace. Depending on the example, this can be annoying.
select 'halls head 6065'
,substr('halls head 6065',4,5) guess
,instr('halls head 6065',' ',-1,1) pos_of_space
,substr('halls head 6065',1,instr('halls head 6065',' ',-1,1)-1 )||'*' suburb
,'*'||substr('halls head 6065',instr('halls head 6065',' ',-1,1)+1 )||'*' postcode
from dual;
Instead I can put the string I want to play with in a simple WITH statement, and change the "column name" at will.
with test as (select 'halls head 6065' suburb from dual)
select suburb
,substr(suburb,4,5) guess
,instr(suburb,' ',-1,1) pos_of_space
,substr(suburb,1,instr(suburb,' ',-1,1)-1 )||'*' suburb
,'*'||substr(suburb,instr(suburb,' ',-1,1)+1 )||'*' postcode
from test;
Other times recently I have used the WITH statement to essentially encapsulate a unit of work. I had extracted a SQL statement from some PL/SQL and wanted to run it with my own restrictions, but leave the original untouched - this allowed me to test without worrying if I've disrupted the original join behaviour.
WITH original as ({copy of embedded sql})
select *
from original
where {my own restrictions}
These are ad hoc reasons, there is also a potential performance benefit from using the WITH clause.

I'm sure I've seen an example where using the WITH clause provided a performance benefit, but memory is fallible so I thought I'd see what the documentation had to say.
It says it will give the optimiser more choice about what to do, and that we can improve the query by using the WITH syntax.
After comparing explain plans from the example straight out of the documentation, I wasn't convinced. So I ran timings over different versions of the database - 1000 iterations each. I was surprised at the results (sounds like tabloid journalism, doesn't it?):

Without the clause
SELECT dname, SUM(sal) AS dept_total
FROM emp, dept
WHERE emp.deptno = dept.deptno
GROUP BY dname HAVING
SUM(sal) >
(
SELECT SUM(sal) * 1/3
FROM emp, dept
WHERE emp.deptno = dept.deptno
)
ORDER BY SUM(sal) DESC;

With the clause
WITH summary AS
(
SELECT dname, SUM(sal) AS dept_total
FROM emp, dept
WHERE emp.deptno = dept.deptno
GROUP BY dname
)
SELECT dname, dept_total
FROM summary
WHERE dept_total >
(
SELECT SUM(dept_total) * 1/3
FROM summary
)
ORDER BY dept_total DESC;
--9i
.3 secs
.0003 secs per iteration
2.9 secs
.0029 secs per iteration
--10g
1.1 secs
.0011 secs per iteration
4.06 secs
.00406 secs per iteration
--11g
1.17 secs
.00117 secs per iteration
7.46 secs
.00746 secs per iteration
Not only has it not performed better, its comparative performance has deteriorated as versions go up. So while we have removed duplicate code and enhanced the readability of our code, this example shows it's not necessarily more efficient.

It may just be a poor example to illustrate the performance benefit of this feature. It shows while a documented effect may be true in some circumstances, you must always test your particular scenario on your framework to get accurate comparisons. For further comment on this paradigm, I highly recommend reading this. Twice.

4 comments:

oraclenerd said...

Scott,

I like and use the WITH clause frequently.

I remember trying to search for it in the beginning (on 9i, ~2004)...I learned quickly that you can't search on "with"

Scott Wesley said...

Yeah, there are a few features with keywords that are difficult to search for! It's partially what compelled me to write a presentation on Documentation.

I know in a few past jobs I've used it to good efficiency, I just couldn't find a good test scenario.

oraclenerd said...

I just remembered a couple of things.

1. I used this in an ETL process to great benefit. The SELECT in the WITH clause was against a small table (maybe 30 or 40 rows) and the bulk was against a table with 40 to 50 million rows.

2. Sometimes I'll add the WITH statement just to show off (not really...kind of) to the non SQL types. I did it recently and performance was not improved what-so-ever. In fact, I saw numbers like you mentioned. In that particular instance there was a bigger gain in another part of the code (removed the cursor loop DELETE).

Scott Wesley said...

It's good when you can find a solution for something that while in itself isn't necessarily the best, its "side effects" improve the bigger picture.

You also reminded me of when the first time I saw WITH, it reminded me of the Pascal WITH syntax where it just saved you typing the record/array details over and over.

Oh... do I miss Pascal?