Tuesday 29 September 2009

Thinking differently about application design

The theme for the 2008 AUSOUG Conference series was "Oracle Software from a User's Perspective". While you could infer focus on the end user from some of the presentations, it's sometimes a difficult theme to focus on, especially when you're writing a technically heavy paper. It's more like relating yourself to your second cousin once removed (unless of course I've missed the point and we're talking about the developer/DBA as the user of Oracle development tools.)

This doesn't mean we can't maintain our awareness for our ultimate customer - the end user.

Tom Kyte recently linked to a cartoon that painted 1000 words about the way we write applications. (Thanks Eric Burke.) We've all been on projects where screens look like this, and considering we write GUI applications for databases that contain structures like this, it's an interface that's inevitable. The same reason the 747 cockpit contains 1000+ knobs and switches.

However, we can still look for opportunities to simplify. The interface to Apple's iPod - iTunes isn't exactly a dummy user's delight. There are plenty of options and interface difficulties. They've just transferred the complexity away from the entry point for 90% of the music player's use.

So what about your application. Are there points of entry that result in a large percentage of use? Is there a way to simplify the entry of key units of data? To go even more granular, are there little smarts you can add to certain types of fields, such as dates? Are there ways to minimise use of the mouse at all times? Is the first key field of your entry form user/time/context aware?

Many considerations to make. I read a lot of blogs, and I admit many of them are certainly from a technical perspective on the database, but there are a few I subscribe to that aren't even software related - but still pertain to my job. In the same way that Microsoft employ psychologists to assist with the design of the menu structure and interfaces to their ubiquitous applications such as Word & Excel, I read some blogs that encourage thinking outside the square.

One such article I came across recently was from Seth Godin - The End of Dumb Software. I can't remember where I first stumbled across his blog, but he always has interesting perspectives as a marketer that I can either apply within my job as a database developer, or even utilise in day to day life.

He was having a gripe in particular about simple calendar applications. If 2 o'clock was selected, you'd have to presume 999 times out of 1000 that 2pm would be requested. So why is that not the default? Why can't a calendar application presume most dates used are within working hours, with the option to change if required. He goes on to suggest he shouldn't need to "edit" an event to view further details, though I understand the architect's potential thinking in that scenario.

There are plenty of other quick examples he runs through that I think all developers, architects, designers should read through and consider. How many LOVs, search windows, entry points of various kinds have you created that could have been written slightly differently to save the end user some time & frustration?

A simple example I can think of in a forms application I once worked on was the ability to enter the letter "t" in a date field to represent (and be automatically translated into on when-item-validated) today's date. How many times in your application may the user want to enter today's date? Turning it into 1 keystroke has saved between 5 and 10 keystrokes, depending on date format settings (310809 to 31-AUG-2009). One keystroke vs eleven? I know what I'd prefer.

Food for thought.

If you have other blogs in your RSS feed like Seth Godin, I'd be interested to hear about them.

Monday 28 September 2009

Advert: On-line Book Reviews

While doing some research I was directed to the online bookstore Packt Publishing. Some other online book stores have a chapter listing, or a short excerpt. At Packt you seem to be able to download a sample chapter for every book you're interested in - at least all that I had a look at.

If you're a blogger you've also got the opportunity to review books. I've chosen one to review to see if it's worthwhile and let developers like you know if it's worth it. Stay tuned over the coming months, I hope to have it reviewed by the time the 2009 AUSOUG Conference series commences.

Online reference manuals are great, but sometimes you need more than a reference. I'm not a huge fan of books that regurgitate the documentation, but a best practices book or a guide to a new concept; case study or transformation process never goes astray.

Friday 25 September 2009

SQL Developer 2.1 EAP

The latest major release of SQL Developer is now in the Early Adopter stage, starting at 2.1 (not to be confused with the much older & somewhat flaky 1.2 version).

I gave it a go and here are some quick observations:
  1. Unzipped footprint including JRE from 166mb to 200mb (Update: I've just noticed the memory usage on my laptop has almost doubled to 237mb!)
  2. On opening I still have to tell it where java.exe is, not the nicest thing to have to do for those more basic users.
  3. Snappier to open, even considering I was migrating preferences from 1.5.4
  4. All icons are sharper and clearer
  5. Jobs included in navigator list
  6. Free data model viewer, which makes sense to me and seems fair.
  7. I can compile packages/procedures using F9 in worksheet - yay!
  8. ... and I get syntax errors more appropriately.
  9. Although saying that when I open the program from the navigator, I get erroneous errors on valid compile.
  10. Syntax errors for queries aren't in popups, just in the results window. The number of rows fetched and timing just above results too, which is more friendly to the eye.
  11. I like the fact the tabs above the result set can be closed if you don't want them.
  12. I still can't figure out how to open to table views at the same time though.
  13. There is a unit testing component that will be interesting to explore, giving the nice people at Quest some more competition in the testing market.
  14. I like the preferences changes. One of which includes more options for SQL formatting, including a preview.
  15. The coding templates are also a good inclusion. Let's ensure good standards are more widely adopted!
  16. It didn't auto prompt me for a connection when I opened a new SQL file and attempted to execute.
  17. F6 for explain plan has moved to F10
  18. ...best I continue with work now and see how it performs with day-to-day tasks.
Official new feature list can be found here.

Some people think all it needs to do is run SQL and PL/SQL, but considering it's a GUI tool we want it to be brisk and easily customisable - otherwise we'll stick with SQL*Plus. Overall, I like the new improvements. I'm sure the kinks will be ironed out quickly and I look forward to the ongoing improvements to a good free product.

Tip o' the hat to Dimitri Gielis for the tip-off.

Monday 21 September 2009

Data Densification - filling dates example

Many years ago when I was first learning Oracle (in the deep end) I came by the problem of data densification. I can't remember how I solved the problem back then, but I've encountered it a few times since. When writing my presentation for the model clause, I explored the difference between two options. Recently whilst investigating syntax definitions for the ANSI standard outer joins, I encountered another solution called a partitioned outer join - so I thought I'd compare all three for performance and readability.

The scenario is thus - I have an amount in a table for various dates throughout the year, for a particular person.
NAME          AMT DT
------ ---------- -----------
Scott         117 11-jan-2009
Scott         250 17-feb-2009
Scott         300 14-apr-2009
Scott          50 06-jun-2009
Wade         1231 17-mar-2009
Wade         2321 22-apr-2009
Wade         3122 30-sep-2009
Wade           59 31-oct-2009
However in my report I wish all months of the year displayed for the person, regardless of whether amounts exist. Let's add a cumulative figure in there for good measure. Included below this is the SQL for my test case.
NAME   MTH              AMT    CUM_AMT
------ --------- ---------- ----------
Scott  January          117        117
Scott  February         250        367
Scott  March              0        367
Scott  April            300        667
Scott  May                0        667
Scott  June              50        717
Scott  July               0        717
Scott  August             0        717
Scott  September          0        717
Scott  October            0        717
Scott  November           0        717
Scott  December           0        717
Wade   January            0          0
Wade   February           0          0
Wade   March           1231       1231
Wade   April           2321       3552
Wade   May                0       3552
Wade   June               0       3552
Wade   July               0       3552
Wade   August             0       3552
Wade   September       3122       6674
Wade   October           59       6733
Wade   November           0       6733
Wade   December           0       6733

DROP TABLE customer;
CREATE TABLE customer (
   name  VARCHAR2(10)
  ,amt   NUMBER
  ,dt    DATE
);

insert into customer values('Scott',117,  to_date('11-01-2009','DD-MM-YYYY'));
insert into customer values('Scott',250,  to_date('17-02-2009','DD-MM-YYYY'));
insert into customer values('Scott',300,  to_date('14-04-2009','DD-MM-YYYY'));
insert into customer values('Scott',50,   to_date('06-06-2009','DD-MM-YYYY'));
insert into customer values('Wade',1231,  to_date('17-03-2009','DD-MM-YYYY'));
insert into customer values('Wade',2321,  to_date('22-04-2009','DD-MM-YYYY'));
insert into customer values('Wade',3122,  to_date('30-09-2009','DD-MM-YYYY'));
insert into customer values('Wade',59,    to_date('31-10-2009','DD-MM-YYYY'));
This is a pre-10g solution, using an efficient method on the DUAL table to conjure records. I also use analytics to determine the cumulative amount. There is however a cartesion join and two full table scans on the customer table.
SELECT date_fill.name
      ,TO_CHAR(real_dt,'Month') mth
      ,NVL(amt,0) amt
      ,NVL(SUM(amt) OVER (PARTITION BY date_fill.name
                          ORDER BY real_dt ),0) cum_amt
FROM
  (SELECT name, TRUNC(dt,'mm') dt, SUM(amt) amt
   FROM   customer
   GROUP BY name, TRUNC(dt,'mm')
   ) actual_data -- Actual data
  ,(SELECT name, real_dt
    FROM  (SELECT DISTINCT name
           FROM   customer)
         ,(WITH mths AS (SELECT TRUNC(SYSDATE,'YYYY') real_dt
                         FROM DUAL CONNECT BY LEVEL <= 12)
           SELECT ADD_MONTHS(real_dt,ROWNUM-1) real_dt FROM mths)
   ) date_fill -- Distinct list with conjured dates
-- Outer join actual data with full date list
WHERE date_fill.real_dt = actual_data.dt(+)
AND   date_fill.name    = actual_data.name(+)
ORDER BY date_fill.name, date_fill.real_dt;

-------------------------------------------------------------------------------------
| Id  | Operation                           | Name     | Rows  | Bytes | Cost (%CPU)|
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |          |     2 |    74 |    11  (28)|
|   1 |  WINDOW SORT                        |          |     2 |    74 |    11  (28)|
|*  2 |   HASH JOIN OUTER                   |          |     2 |    74 |    11  (28)|
|   3 |    VIEW                             |          |     2 |    24 |     6  (17)|
|   4 |     MERGE JOIN CARTESIAN            |          |     2 |    24 |     6  (17)|
|   5 |      VIEW                           |          |     1 |     6 |     2   (0)|
|   6 |       COUNT                         |          |       |       |            |
|   7 |        VIEW                         |          |     1 |     6 |     2   (0)|
|   8 |         CONNECT BY WITHOUT FILTERING|          |       |       |            |
|   9 |          FAST DUAL                  |          |     1 |       |     2   (0)|
|  10 |      BUFFER SORT                    |          |     2 |    12 |     6  (17)|
|  11 |       VIEW                          |          |     2 |    12 |     4  (25)|
|  12 |        HASH UNIQUE                  |          |     2 |    12 |     4  (25)|
|  13 |         TABLE ACCESS FULL           | CUSTOMER |     8 |    48 |     3   (0)|
|  14 |    VIEW                             |          |     8 |   200 |     4  (25)|
|  15 |     HASH GROUP BY                   |          |     8 |   136 |     4  (25)|
|  16 |      TABLE ACCESS FULL              | CUSTOMER |     8 |   136 |     3   (0)|
-------------------------------------------------------------------------------------
Here is what some would consider a more elegant solution using the model clause. The explain plan is far neater, and the performance was also enhanced - and there was no joins & half as many logical reads. The cumulative column was calculated using a nifty rule computation.
SELECT name, TO_CHAR(dt,'DD-MM-YYYY') dt, amt, cum_amt -- Model results
FROM (
   SELECT name, TRUNC(dt, 'MM') dt, SUM(amt) amt
   FROM   customer
   GROUP BY name, TRUNC(dt, 'MM')
)
MODEL
PARTITION BY (name)
DIMENSION BY (dt)
MEASURES (amt, cast(NULL AS NUMBER) cum_amt) -- Define calculated col
IGNORE NAV
RULES SEQUENTIAL ORDER(
   -- Conjure dates
   amt[FOR dt FROM TO_DATE('01-01-2009', 'DD-MM-YYYY')
              TO   TO_DATE('01-12-2009', 'DD-MM-YYYY')
              INCREMENT NUMTOYMINTERVAL(1, 'MONTH')
       ] = amt[CV(dt)] -- Apply amt for given date, if found
  ,cum_amt[ANY] = SUM(amt)[dt <= CV(dt)] -- Calculate cumulative
)
ORDER BY name, dt;
------------------------------------------------------------------------------
| Id  | Operation                    | Name     | Rows  | Bytes | Cost (%CPU)|
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |          |     8 |   136 |     5  (40)|
|   1 |  SORT ORDER BY               |          |     8 |   136 |     5  (40)|
|   2 |   SQL MODEL ORDERED          |          |     8 |   136 |     5  (40)|
|   3 |    HASH GROUP BY             |          |     8 |   136 |     5  (40)|
|   4 |     TABLE ACCESS FULL        | CUSTOMER |     8 |   136 |     3   (0)|
|   5 |    WINDOW (IN SQL MODEL) SORT|          |       |       |            |
------------------------------------------------------------------------------
The following solution is the one I stumbled upon while playing with ANSI joins. I wanted to see how it compared with the others. We still have a join, but that's not necessarily an issue. It's certainly tidier looking than the first solution, and we don't need to fry our brain learn how to use model. I have heard some resistance to ANSI syntax however - not for the sake of going against the grain, but apparently there have been some security issues. I'm not qualified to comment on this, let's just compare the performance in this particular test case.
SELECT c.name, TO_CHAR(real_dt,'Month') mth, NVL(amt,0) amt
      ,NVL(SUM(amt) OVER (PARTITION BY c.name
                          ORDER BY real_dt ),0) cum_amt
FROM customer c
PARTITION BY (name)
RIGHT OUTER JOIN
  (WITH mths AS (SELECT TRUNC(SYSDATE,'YYYY') real_dt
                 FROM DUAL CONNECT BY LEVEL <= 12)
   SELECT ADD_MONTHS(real_dt,ROWNUM-1) real_dt FROM mths) mths
ON (real_dt = TRUNC(c.dt,'mm'))
ORDER BY c.name, real_dt;
-------------------------------------------------------------------------------------
| Id  | Operation                           | Name     | Rows  | Bytes | Cost (%CPU)|
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |          |     2 |   130 |     8  (13)|
|   1 |  WINDOW SORT                        |          |     2 |   130 |     8  (13)|
|   2 |   VIEW                              |          |     2 |   130 |     8  (13)|
|   3 |    NESTED LOOPS PARTITION OUTER     |          |     2 |    46 |     8  (13)|
|   4 |     BUFFER SORT                     |          |       |       |            |
|   5 |      VIEW                           |          |     1 |     6 |     2   (0)|
|   6 |       COUNT                         |          |       |       |            |
|   7 |        VIEW                         |          |     1 |     6 |     2   (0)|
|   8 |         CONNECT BY WITHOUT FILTERING|          |       |       |            |
|   9 |          FAST DUAL                  |          |     1 |       |     2   (0)|
|* 10 |     FILTER                          |          |       |       |            |
|  11 |      SORT PARTITION JOIN            |          |     1 |    17 |     4  (25)|
|  12 |       TABLE ACCESS FULL             | CUSTOMER |     1 |    17 |     3   (0)|
-------------------------------------------------------------------------------------
I executed these 3 statements 1000 times on my laptop on 10gR2 (similar results in 11gR1) to see how throughput compares. I consistently found the following performance:
6 secs -- Original
.006 secs per iteration
4.6 secs -- Model clause
.0046 secs per iteration
.35 secs -- Partitioned outer join
.00035 secs per iteration
Personally I was a little surprised at the dramatic timing difference with the partitioned outer join over the original. I thought perhaps that would have been more comparable to the model clause solution. Something to bear in mind in future development.

If I needed to write this for a future project I'd probably take a closer look at the relevant data set using Tom Kyte's runstats package. This used to be located here - htpp://asktom.oracle.com/tkyte/runstats.html, but the link is currently broken. Possibly related to his recent move. However here is one of many of his forum entries addressing runstats. This package is better than just measuring throughput, it will let you know how your application may scale by paying attention to latches. A DBA's perspective can be found here.

Sunday 13 September 2009

Other uses for NULLIF

There are some features in the database that are understandably difficult to first come across. Unless you have a specific need to fulfil, or you aren't the type who meanders through the new features guide, or you're just relatively new to Oracle, you may not have come across useful day-to-day features & concepts such as WITH, DBMS_JOB or pipelining. (I just picked a few random ones that came to mind...)

There are also abundant single row functions that are available, and some of them are easy to use, but sometimes no apparent usage immediately stands out. I make it no secret that my favourite online reference manual is the Oracle Database SQL Reference. I have a shortcut to a local version in my browser toolbar. Personally I prefer the 10g layout to the 11g one, but beggars can't be choosers I suppose.

I thoroughly recommend beginners and even the more mature programmers to peruse this manual, in particular the chapter on SQL Functions. I believe awareness is often the key to writing a successful application - or at least not re-inventing the wheel.

So to stop beating around the bush, what about the NULL-related functions, specifically NULLIF. Have you used it?
The manual illustrates an example showing those employees whose job has changed since they were hired. If their job hasn't changed, then null is shown.
SELECT e.last_name, NULLIF(e.job_id, j.job_id) "Old Job ID"
FROM employees e, job_history j
WHERE e.employee_id = j.employee_id
ORDER BY last_name;

LAST_NAME                 Old Job ID
------------------------- ----------
De Haan                   AD_VP
Hartstein                 MK_MAN
Kaufling                  ST_MAN
Kochhar                   AD_VP
Kochhar                   AD_VP
Raphaely                  PU_MAN
Taylor                    SA_REP
Taylor
Whalen                    AD_ASST
Whalen
What about if an employee record has multiple e-mail address columns - an alternative point of contact. In some scenarios/records this could potentially be unclean, so instead of coding with these older statements:
CASE WHEN email_address != email_address_alternate
THEN email_address_alternate
ELSE null
END email_address_alternate

DECODE(email_address, email_address_alternate, null, email_address_alternate) email_address_alternate
We can use a tidier:
NULLIF(email_address_alternate, email_address)
A similar situation applies to a mailing/delivery/home address (albeit ideally on a separate table) or a work/mobile/home number, but in this case the COALESCE function is more appropriate. I plan a blog entry on this function in the near future, along with performance considerations.

Perhaps you are performing a data load and you need to ignore certain indicative data coming through:
NULLIF(surname, 'INACTIVE')
The final scenario that I've started to use regularly is when faced with the old mathematical enigma of dividing by zero. (The wiki entry is a thrilling read, almost as fun as the images returned by google search)

Instead of facing this throughout a report:
select 1/0 from dual
*
ERROR at line 1:
ORA-01476: divisor is equal to zero
We can do this to return null where price is 0 instead of an error.
SELECT (price-cost)/NULLIF(price,0) gp FROM sales;
Sweet.

Thursday 10 September 2009

A LONG view ahead

Every time a new version is release I feel compelled to have a little peak at the definition of ALL_VIEWS, to see if columns such as ALL_VIEWS.TEXT have been converted from the LONG datatype. Alas, 11gR2 still has this utilises the LONG datatype, perhaps this is just something that has to be persistent through the ages.

However we can still get around the issue. The reason I find this of interest is sometimes during an impact analysis of say - dropping a column from a table, we need to determine what objects refer to this column. We could start with ALL_DEPENDENCIES, but this isn't granular enough for some analyses.

Due to the many restrictions with the LONG datatype, we can't use the INSTR function to search this dictionary view. Here's one work-around:
> DROP TABLE my_views;

> CREATE TABLE my_views(text CLOB);

> INSERT INTO my_views (text)
SELECT TO_LOB(text) 
FROM all_views;

SAGE@sw10g> SELECT *
  2  FROM   my_views
  3  WHERE INSTR(lower(text),'hiredate') > 0;

TEXT
------------------------------------------------------------------------
select "EMPNO","ENAME","JOB","MGR","HIREDATE","SAL","COMM","DEPTNO"

1 row selected.
Got any others?

I do! 
Here is my posting from April 2012 on using XML to solve the problem.

Thursday 3 September 2009

PRECEDES follows FOLLOWS

Thanks for a tip-off from volleyball coach / Oracle guru Connor McDonald (and later by colleague Chris Muir), it seems 11gR2 was released while I was 30000ft in the sky.

I wouldn't be practising what I preach if I didn't point you to one of the best books in the online Oracle Documentation - the New Features Guide.

If you want to keep up with Oracle Technology, and learn a thing or two, every time a new version of Oracle is release, I highly recommend a peruse through this book.

Keep a lookout in the blog community because plenty of articles pop-up around these times showing off the shiny new features. One feature I'll mention today is an extension to triggers.

In a recent presentation I included some thoughts on compound triggers and a quick note on the FOLLOWS clause, allowing you to indicate that a trigger fire after a specified trigger.

This can be useful if you need to extend proprietary Oracle software.
create or replace trigger package_trigger
after update of salary
on employees
for each row
begin
dbms_output.put_line('package_trigger');
end old_way;
/

create or replace trigger custom_stuff
after update of salary
on employees
for each row
follows package_trigger
begin
dbms_output.put_line('custom_stuff');
end old_way;
/
I don't know whether it was an afterthought or this was just one of the last features installed in 11gR1 at the pleading request of a client, but it was a little odd that a PRECEDES type functionality wasn't included.

However now in 11gR2, this is now available.

There are caveats however, and in this case PRECEDES may only be applied to a reverse cross edition trigger - this is also a whole new ball game and I can't yet confidently tell you more about editions, except that Connor's been excited about the prospect of these for quite some time & it's impact throughout the database seems widespread.

Other features for the handy developer to keep an eye out for include:
  • Enhancements to the Forms->Apex conversion process
  • Analytical function improvements
  • Recursive WITH clause - ANSI compliant hierarchical queries
  • Flashback support for DDL
  • Improvements to Oracle Scheduler
Oracle keeps on Oracling.