Friday, November 6, 2009

Don't mess with LIKE

Oh boy, It looks like it's time to change the title of this blog to "A hundred ways you can screw up with Oracle". I may need a new domain name, something flashy. www.oraclewtf.com would be fine. Let me see, it is currently available. I wonder how fast cyber-squatters will jump in and snatch it now when I pronounced it. Then they will blackmail me demanding a outrageous ransom or else... (insert evil face here)

On second thought, no. I don't want to turn this blog into an Oracle-specific. There are already enough Oracle blogs out there. In fact, I think there are more of them than Oracle professionals who would read them. I don't want to bring yet another one to the world just for the sake of it. Heck, it's my place and I'm going to write about whatever I want. Hence the title, Random Thoughts.

Ok, kids, take your places. Today's lesson is (surprise, surprise!) about Oracle. We already discussed a few ways we can screw up with dates. Today we will talk about numbers. On the surface numbers look like pretty innocent data type. But once you dive a little deeper... Beware! Fearful creatures lurk beneath. And if you are not careful, they will snatch you in no time.

Take a look at this Stackoverflow.com question by James Collins.
James had a problem, the following query was slow:

SELECT a1.*
FROM   people a1
WHERE  a1.ID LIKE '119%'
       AND ROWNUM < 5

Despite column A1.ID was indexed, the index wasn't used and the explain plan looked like this:

SELECT STATEMENT ALL_ROWS
Cost: 67 Bytes: 2,592 Cardinality: 4 2 COUNT STOPKEY 1 TABLE ACCESS FULL TABLE people
Cost: 67 Bytes: 3,240 Cardinality: 5

James was wondering why. Well, the key to the issue lies, as it often happens with Oracle, in an implicit data type conversion. Because Oracle is capable to perform automatic data conversions in certain cases, it sometimes does that without you knowing. And as a result, performance may suffer or code may behave not exactly like you expect.

In our case that happened because ID column was NUMBER. You see, LIKE pattern-matching condition expects to see character types as both left-hand and right-hand operands. When it encounters a NUMBER, it implicitly converts it to VARCHAR2. Hence, that query was basically silently rewritten to this:

SELECT a1.*
FROM   people a1
WHERE  To_char(a1.ID) LIKE '119%'
       AND ROWNUM < 5

That was bad for 2 reasons:

  1. The conversion was executed for every row, which was slow;
  2. Because of a function (though implicit) in a WHERE predicate, Oracle was unable to use the index on A1.ID column.
If you came across a problem like that, there is a number of ways to resolve it. Some of the possible options are:
  1. Create a function-based index on A1.ID column:
    CREATE INDEX people_idx5 ON people (To_char(ID));
  2. If you need to match records on first 3 characters of ID column, create another column of type NUMBER containing just these 3 characters and use a plain = operator on it.

  3. Create a separate column ID_CHAR of type VARCHAR2 and fill it with TO_CHAR(id). Index it and use instead of ID in your WHERE condition.
  4. Or, as David Aldridge pointed out: "It might also be possible to rewrite the predicate as ID BETWEEN 1190000 and 1199999, if the values are all of the same order of magnitude. Or if they're not then ID = 119 OR ID BETWEEN 1190 and 1199 etc.."

Of course if you choose to create an additional column based on existing ID column, you need to keep those 2 synchronized. You can do that in batch as a single UPDATE, or in an ON-UPDATE trigger, or add that column to the appropriate INSERT and UPDATE statements in your code.

James choose to create a function-based index and it worked like a charm.

No comments:

Popular Posts