Sergey Stadnik's Blog: 2009

Friday, November 6, 2009

Don't mess with LIKE

Oh boy, It looks like it's time to change the title of this blog to "A hundred ways you can screw up with Oracle". I may need a new domain name, something flashy. www.oraclewtf.com would be fine. Let me see, it is currently available. I wonder how fast cyber-squatters will jump in and snatch it now when I pronounced it. Then they will blackmail me demanding a outrageous ransom or else... (insert evil face here)

On second thought, no. I don't want to turn this blog into an Oracle-specific. There are already enough Oracle blogs out there. In fact, I think there are more of them than Oracle professionals who would read them. I don't want to bring yet another one to the world just for the sake of it. Heck, it's my place and I'm going to write about whatever I want. Hence the title, Random Thoughts.

Ok, kids, take your places. Today's lesson is (surprise, surprise!) about Oracle. We already discussed a few ways we can screw up with dates. Today we will talk about numbers. On the surface numbers look like pretty innocent data type. But once you dive a little deeper... Beware! Fearful creatures lurk beneath. And if you are not careful, they will snatch you in no time.

Take a look at this Stackoverflow.com question by James Collins.
James had a problem, the following query was slow:

SELECT a1.*
FROM   people a1
WHERE  a1.ID LIKE '119%'
       AND ROWNUM < 5

Despite column A1.ID was indexed, the index wasn't used and the explain plan looked like this:

SELECT STATEMENT ALL_ROWS
Cost: 67 Bytes: 2,592 Cardinality: 4 2 COUNT STOPKEY 1 TABLE ACCESS FULL TABLE people
Cost: 67 Bytes: 3,240 Cardinality: 5

James was wondering why. Well, the key to the issue lies, as it often happens with Oracle, in an implicit data type conversion. Because Oracle is capable to perform automatic data conversions in certain cases, it sometimes does that without you knowing. And as a result, performance may suffer or code may behave not exactly like you expect.

In our case that happened because ID column was NUMBER. You see, LIKE pattern-matching condition expects to see character types as both left-hand and right-hand operands. When it encounters a NUMBER, it implicitly converts it to VARCHAR2. Hence, that query was basically silently rewritten to this:

SELECT a1.*
FROM   people a1
WHERE  To_char(a1.ID) LIKE '119%'
       AND ROWNUM < 5

That was bad for 2 reasons:

The conversion was executed for every row, which was slow;
Because of a function (though implicit) in a WHERE predicate, Oracle was unable to use the index on A1.ID column.

If you came across a problem like that, there is a number of ways to resolve it. Some of the possible options are:

Create a function-based index on A1.ID column:

CREATE INDEX people_idx5 ON people (To_char(ID));

If you need to match records on first 3 characters of ID column, create another column of type NUMBER containing just these 3 characters and use a plain = operator on it.

Create a separate column ID_CHAR of type VARCHAR2 and fill it with TO_CHAR(id). Index it and use instead of ID in your WHERE condition.
Or, as David Aldridge pointed out: "It might also be possible to rewrite the predicate as ID BETWEEN 1190000 and 1199999, if the values are all of the same order of magnitude. Or if they're not then ID = 119 OR ID BETWEEN 1190 and 1199 etc.."

Of course if you choose to create an additional column based on existing ID column, you need to keep those 2 synchronized. You can do that in batch as a single UPDATE, or in an ON-UPDATE trigger, or add that column to the appropriate INSERT and UPDATE statements in your code.

James choose to create a function-based index and it worked like a charm.

Wednesday, November 4, 2009

SYSDATE confusions

SYSDATE is one of the most commonly used Oracle functions. Indeed, whenever you need the current date or time, you just type SYSDATE and you're done. However, sometimes it's not all that simple. There are a few confusions associated with SYSDATE that are pretty common and, if not understood, can cause a lot of damage.

First of all, SYSDATE returns not just current date, but date and time combined. More precisely, the current date and time down to a second. If just a date is needed, TRUNC function has to be applied, that is, TRUNC(SYSDATE). For a sake of a good database design, date should not be confused with date/time. For example, if a column in a table is called “transaction_date”, it would be natural for it to contain a date, but not date/time. That may lead to a major confusion. Let's imagine there is a table BANK_TRANSACTIONS containing the following fields:

txn_no     INTEGER,
txn_amount NUMBER(14,2),
txn_date   DATE

The last field is of the most interest to us. Apparently its data type is “DATE”, but is it a date or date/time? We can't tell by just looking at the table definition. Nonetheless, it is a very important thing to know. A common case for using DATE columns is including them in date range queries. Forexample, if we wanted to get all the bank transactions from 1 January 2009 to 31 July 2009 we could write this:

SELECT txn_no,
       txn_amount
FROM   bank_transactions
WHERE  txn_date BETWEEN To_date('01-JAN-2009','DD-MON-YYYY')
                    AND To_date('31-JUL-2009','DD-MON-YYYY')

And that would be fine if TXN_DATE were a date column. But if it is a date/time, we would just have missed a whole day worth of data. And it is because, as I said, DATE data type can hold date/time down to a second. That means that for 31 July 2009 it could hold values ranging from 0:00am to 11:59pm. But because TO_DATE('31-JUL-2009', 'DD-MON-YYYY') is basically an equivalent to TO_DATE('31-JUL-2009 00:00:00', 'DD-MON-YYYY HH24:MI:SS'), all the transactions happened after 0:00am on 31 July 2009 would be missed out.

That kind of mistake is pretty common. Sometimes it's hard to tell by just looking at the data whether a particular DATE column can have date portion. Even if all the values in there are rounded to 0:00 hours, that doesn't mean that a different time value can't appear there in the future. The data dictionary can't help us here either – DATE type is always the same whether it contains time or not. (By the way, Oracle recommends using TIMESTAMP type for new projects, but that is a whole different story.)

If you are working with an existing table and you are not sure, you can use a fool-proof method like this:

SELECT txn_no,
       txn_amount
FROM   bank_transactions
WHERE  txn_date BETWEEN To_date('01-JAN-2009','DD-MON-YYYY')
                    AND To_date('31-JUL-2009','DD-MON-YYYY') + 1 – 1/24/3600

“+1 – 1/24/3600” here means “Plus 1 day minus 1 second”. That is because “1” in DATE type means “1 day”, “1/24” - 1 hour, and there are 3600 seconds in an hour.

The above expression will retrieve all the transactions from “01 January 2009 0:00am” to “31 July 2009 0:00am plus 1 day minus 1 second”, i.e. to “31 July 2009 23:59pm”.

If you are charged with designing an application and need to create a table with a DATE column, it is worth to keep yourself and others from future confusions by a simple trick: name columns that only contain date portions as “_DATE” and add “_TIME” to the name of the columns that you know will contain time components. In our case it would be prudent to call the date/time column TXN_DATE_TIME.

The second issue I'd like to discuss is much more subtle, but can do even more damage.

Imagine that you are charged with developing a report that returns all the transaction for the previous month. It looks like a job for SYSDATE! You fetch your trusty keyboard and after a few minutes of typing you come up with something like this:

SELECT txn_no,
       txn_amount
FROM   bank_transactions
WHERE  txn_date BETWEEN Last_day(Add_months(Trunc(SYSDATE),-2)) + 1
                    AND Last_day(Add_months(Trunc(SYSDATE),-1))

You create a few lines in BANK_TRANSACTIONS table, run a few unit tests to make sure your code works and check it into the source control. Job done! You congratulate yourself on the productive work and spend the rest of the day reading your friends' blogs and dreaming about your next vacation. And the next day you move on to another task and get as busy as ever.

After some time, which may be a few days or months, depending on the pace of the project, the code you wrote gets migrated into the UAT environment. And a task force of a few testers and end users is assigned to test the report you wrote. And as it often happens in UAT, they are going to test in on real data they extracted from the production system – that is, the last year's data.

Got it? Last year's.

The final stages of testing, such as UAT, have to prove that the system does what it is expected to do in conditions that resemble the production as closely as possible. And the best way to do that is to test it on the retrospective production data – the data that is proven. That makes it possible to compare the outcome to the actual production system, and thus, prove or disprove that the new system works.

That sounds reasonable. But one of the implications for you is that BANK_TRANSACTIONS table is not going to contain previous month's transactions. Hence, your report will be blank. You can't rewind back time because you hard-coded SYSDATE, which has only one meaning – “right now”. Test failed.

If you have known that when you wrote it, you wouldn't have used the SYSDATE. You would use a parameter, something like v_run_date, which you could set to whatever date you wanted. And that would do. Well, now you know.

Friday, October 2, 2009

Make it beautiful

You only need a single look at Sydney Opera House to recognise that it is a work of art. Any masterpiece is like that – you don't need to do a throughout examination of Mona Lisa's smile to realise its beauty – you see it instantly. Perfection needs no explanation, it works on subconscious level. The same applies to the software engineering too. Great code always looks good. It is always carefully formatted, indented and commented. By just looking at it you can tell that it is a work of art. Such code will always work, do what it is supposed to do and have a very few bugs. Because whoever wrote that code cared a lot about it. And you can safely assume that if anyone has put a lot of effort into making the code looking good, he has put at least as much effort into designing and debugging it. What is even more important, carefully carved code is easier to maintain. In modern software projects any single procedure gets tweaked and rewritten tens of times. If you are a programmer, good chances that even in a project that you work on right now you inherited some code that was written years ago, maybe from people who long left the company. And when you finish with it, it will not be the end of the story – the code will be passed to QA and finally to the production support. And then the cycle will start again. Hence, whatever you program, it's not just about you. You don't know how many people will be looking into your code trying to make sense of it. And you can help them immensely by making it is easy to read and understand now. Whether you will be remembered as a good programmer or cursed depends on it. So, you made an effort to write the code that works. Now make an extra step – make it beautiful.

Wednesday, September 9, 2009

Quest for the perfect reader is almost over

Those who know me know that I've been searching for a perfect ebook reader device for years. I've used mobile phones, Palm PDAs, pocket PCs, laptops... About 7 years ago I got excited about prospects of E-Ink technology, promising to deliver just what I needed. Unfortunately, E-Ink has turned out to be utter disappointment, so far delivering little but bold promises. It took years and years for the first E-Ink devices to turn up on the market. A few years ago I finally got my hands onto Sony PRS-500 E-Ink reader. Got disappointed with it after a week and sold it. A year ago I bought another E-Ink reader - that time BeBook. I still have it, but now my wife uses it. And I've got myself something better. That is iPhone. Don't get me wrong. I still like E-Ink readers. But even after all those years E-Ink remains to be a promising technology. Guys, it's about time to deliver on promises. Maybe one day... But now, E-Ink has more cons than pros: devices are expensive, screens are slow, the screen contrast is poor. And versatility is even poorer - every E-Ink reader is practically useless for anything but reading, making you carry yet another device in your bag. And if all those weren't enough, many manufacturers lock the devices to just a few supported DRM-enabled book formats (this is for you, Amazon). In few short years, iPhone delivered what E-Ink failed in decade. The truth is, the iPhone's screen is brilliant. I don't know how they did it, but it's bright, highly contast and it (almost) doesn't fade in sunlight. Yes, it's resolution is not as high as of some VGA PDAs and laptops, but it still looks better than any other LCD screen I've seen. Reading from that screen is a pleasure. Honestly, if I am provided a choice to read from either E-Ink (in its current state) and iPhone's LCD, I'd probably choose iPhone. And unlike the dedicated readers, iPhone is immensely versatile. I don't know the statistics, but it's got to be the most popular mobile software platform out there. Thousands and thousands of software titles are available. And there is no problems with ebook types either - Software like Stanza and Calibre make reading almost any kind of book a simple task. All in all, for all of you who has been waiting for a perfect reader - the wait is almost over. iPhone has delivered yet another revolution. Now is the time for a major publishing houses to wake up.

Sunday, July 26, 2009

How to get a root password

It's been a month since I started playing an involving adventure game "Get a root password for a weekend" with a very large multinational consulting company. That quest requires carrying out a complex sequence of actions, each of which is unknown in advance. A single error leads to a failure and necessity to start all over again. Do not pass Go, do not collect $200. Now I am almost through 2nd attempt. And I almost failed again. So, to get a root password you have to:

No less than in 2 weeks before day "D" create a change docket in a change management system.
Fill a couple of 15-pages documents, describing in details what we need to do, why and how.
Obtain approvals from our and their management.
Obtain sign-offs from the downstream systems, even the ones that would not be affected.
Attach all the approvals to the change docket.
Create a task to issue a temporary root password to us.
Send a request to the service delivery manager, asking to approve the task and assign in to a responsible person.
Attend the Change Review Board and get the change approved.
Find out that the task assigned to a wrong group. Reassign.
Find out that in order to get a root password you need to fill a form.
Obtain the form from a Security group.
Fill the form.
Get the form signed by 3 different people in 3 different buildings.
Submit the form.
In a few days get a reply from the Security group, telling that the form was filled incorrectly - a tick was put into a different box.
Fill the form again.
Get the form signed by 3 different people in 3 different buildings.
Submit the form.
After a few day's silence, start nagging the Service Delivery Manager.
Find out that another Security group is responsible for granting root passwords.
Reassign the task to the new group and forward the form to them.
After a few day's silence, start nagging the Service Delivery Manager.
Find out that yet another User Admin Security Group is responsible for granting the root passwords.
Reassign the task to the new group and forward the form to them.
Find out that the submitted form is outdated. The User Admin Security Group no longer accepts outdated forms. (The form that those guys themselves sent 3 weeks ago was outdated).
Download the new form. The difference with the old one is just that the checkboxes are positioned differently.
Fill the form again.
Get the form signed.
Submit the form.
Find out that the form hasn't changed for the last 4 years.

**** Now I think that those guys are actually Vogons.

“They wouldn't even lift a finger to save their own grandmothers from the Ravenous Bugblatter Beast of Traal without orders signed in triplicate, sent in, sent back, queried, lost, found, subjected to public inquiry, lost again, and finally buried in soft peat for three months and recycled as firelighters." The Hitchhiker's Guide to the Galaxy by Douglas Adams

Monday, May 18, 2009

Date conversion in Oracle part 2

It's a follow-up to the previous post. As it turned out, implicit date conversions may also prevent Oracle from doing the partition pruning. For example, if you have a table INVOICES with a range partition on INVOICE_DATE field, then expression

SELECT
...
WHERE invoice_date >= '01-MAR-09'
  AND invoice_date <  '02-MAR-09'

will not perform the partition pruning, whereas

SELECT
...
WHERE invoice_date >= TO_DATE('01/03/2009', 'DD/MM/YYYY')
 AND invoice_date <  TO_DATE('02/03/2009', 'DD/MM/YYYY')

will. Because the efficiency of partition pruning is usually why partitioning is used in the first place, the choice is obvious. But after all, I’d use

SELECT
...
WHERE invoice_date BETWEEN TO_DATE( '01/03/2009', 'DD/MM/YYYY')
                      AND TO_DATE( '01/03/2009', 'DD/MM/YYYY') + 1 - 1/24/3600

,since BETWEEN operation is specifically tailored for such situations. "1/24/3600" here represents 1 second, and the whole statement should be read as "From 01 March 2009 0:00am to 01 March 2009 11:59pm".

Wednesday, May 13, 2009

Date conversions in Oracle

When I was going through PL/SQL procedures written by some of my colleagues, I noticed a few mistakes made around the Oracle’s date conversion functions. There are some peculiarities about those functions that I thought everyone knew about. But I reckon if I write about it, it may help others to avoid such mistakes. I also allowed myself to outline a few rules that, if you adhere to them, will help you to write better programmes.

There are 2 functions in Oracle to convert strings to dates and back.

The first one is TO_DATE – it takes a string parameter and returns a date. Ok, it’s actually a date/time combination enclosed into a single data type – Oracle’s DATE.

The second one is TO_CHAR – does the opposite: it takes date/time as Oracle’s DATE data type and converts it to string.

Actually, these functions are bit more complex than that, but for our purpose that will do. What’s important to understand here is the distinction between date as a DATE data type and its string representation.

For example, when you type ’01-APR-09’ in the procedure’s text, that’s a string, representing a date. Pay attention here: although you meant a date, Oracle sees a string. For Oracle everything that is enclosed in single quotation marks is a string. To make it a date, we need to convert this string to a DATE data type. Such conversion can be carried out by 2 possible ways: explicitly and implicitly.

Explicit conversion is when we apply the TO_DATE function to the string:

v_date DATE;
v_date := TO_DATE('01/04/2009', 'DD/MM/YYYY');

Now v_date is a date, representing April 1^st, 2009.

Implicit conversion is when we let Oracle to perform the conversion:

v_date DATE;
v_date := '01-APR-09';

It has the same effect. Every time Oracle sees a string in place where it expects a date, it is smart enough to perform the conversion for us. "Well", you may think, - "That’s great. Oracle does it all for us, so we don’t have to do it. Life is easier, let’s go for another coffee break".

Not quite.

You see, when Oracle does such implicit conversion, it relies on some assumptions. If you read the documentation for TO_DATE and TO_CHAR functions, you’ll find that they take another optional parameter – the date format. That format tells Oracle how the string representing the date/time should be treated. If the format parameter is not specified, it is taken from NLS_DATE_FORMAT Oracle parameter. Here’s the crux: We can’t assume that this parameter will be the same on all Oracle systems. Although it is ‘DD-MON-RR’ by default and it is left like that on most Oracle systems, we can’t assume that it’s going to be like this always and everywhere. And if you rely on implicit date conversions and some DBA changes NLS_DATE_FORMAT parameter – WHAM! – All your programs will stop working.

So, a good practice and rule of thumb for you should be:

Never ever rely on implicit date conversions!Whenever you need to convert date to string or vice versa, use an appropriate TO_DATE or TO_CHAR function and always specify a date format.

Just like this:

v_date DATE;
v_date := TO_DATE('01/04/2009', 'DD/MM/YYYY');

The danger of NLS_DATE_FORMAT being changed is the biggest threat but not the only one.

Pay attention to the default date format I provided just above – ‘DD/MM/RR’. Do you notice anything suspicious? The year is 2 digits. Here Oracle tries to be smartass and tries to guess whether you mean XX or XXI century. Your only hope that it can figure out what you meant and doesn’t make a mistake. But if it mistakes – oops, welcome back the Millennium Bug. This brings us to the second rule:

Don’t be a curmudgeon – always specify the 4-digit year.

Another dangerous programming technique is trying to convert Date to Date where no conversion is necessary.

Let’s have a look at the following example, or should I say a puzzle?

DECLARE
   v_date     DATE := '01-APR-09';
   v_date_2   DATE := TO_DATE (v_date, 'DD/MM/YYYY');
BEGIN
   dbms_output.put_line (TO_CHAR (v_date_2, 'DD/MM/YYYY'));
END;

Try to guess what will be printed as a result.

If you think ‘01/04/2009’, you’ve just screwed your business critical application and have sent it two thousand years back in time. In fact, you’ll get ‘01/04/0009’.

This is where it all goes bad:

v_date_2   DATE := TO_DATE (v_date, 'DD/MM/YYYY');

And here’s why:

The first thing Oracle tries to do is to execute TO_DATE function. There is only one TO_DATE function in Oracle – the one that takes a string and converts it to a date. Despite we know that v_date is not a string, Oracle still proceeds with its logic. If you run this code, it won’t produce an error. Oracle successfully convinces itself that it sees a String where it has a Date. That happens because Oracle is able to implicitly convert that date to a string, effectively turning that line into

v_date_2   DATE := TO_DATE (TO_CHAR(v_date), 'DD/MM/YYYY');

But, as we’ve already learned, implicit date to string conversions are performed using the date format recorded in NLS_DATE_FORMAT Oracle parameter, which is by default set to ‘DD-MM-RR’. Hence, what Oracle effectively does is this:

v_date_2   DATE := TO_DATE (TO_CHAR(v_date, 'DD-MM-RR'), 'DD/MM/YYYY');

Can you spot the error already? The date formats are inconsistent! This is what you get when you don’t pay attention to the details.

So, here comes rule 3:

Avoid unnecessary conversions. Never convert dates to dates.

If you think that all this stuff is pretty confusing, that's because it indeed is. The good news is that you can avoid the confusion altogether by learning to program in a more clear, more concise way. That is a foundation of a good programming style.

Sergey Stadnik's Blog