Oracle Database Performance: Scalability, Myths, and Realities

Ensuring that today’s finely tuned statement stays that way tomorrow is the true key to scalable performance

By Stéphane Faroult

Project managers "sell" architectures, hardware, and software based on Oracle on the strength of scalability. In fact, crying "scalability" is a great way to get upper management’s approval and dodge its fears of high costs and business risks. Twice as many users in the future? No problem -- with our bright, shiny multiprocessing architecture, we can start twice as many processes, crow these project managers. And if the hardware runs out of steam, we'll add more memory and more CPUs. Volumes are going to increase several fold? No problem, they reply, we'll add more disks, more controllers, more memory, and by doing things in parallel, we'll maintain the excellent response times that so greatly impressed you in the demo. But what will really happen? The number of users and volume will increase, more powerful hardware will be needed, and performance will sink to an abysmally low level nonetheless. Crisis, commando operation, a few heads roll, and somehow, because the business must go on, project managers will keep things on track. But at what price? Indeed, Oracle has strong features in terms of scalability, but that scalability doesn’t come simply from using Oracle. It’s critical to be careful about how you write your queries.

The False Linear Hypothesis

When talking about scalability, most people implicitly assume some linearity that unfortunately happens to be more the exception than the norm; in fact, we’re biased because we have trouble appreciating differences that aren’t linear; however, things are rarely linear. Many people will think automatically that a rectangle four times smaller than another one is only two times smaller. Most often, situations get worse a lot quicker than you expect.

Let’s use an Oracle example to illustrate the nightmare of every booming business I know: billing. Let's say you have a mail order business and you have a batch program running every night to print invoices. We can imagine a very basic and classical database design in which we have two reference tables, CUSTOMERS, with CUSTOMERID as the primary key and ARTICLES, with ARTICLEID as primary key. On top of that, ORDERS will reference CUSTOMERS with some global information, such as the date and possibly some redundant information, such as the total amount to save on future processing. ORDER_DETAILS will link ORDERS, identified by ORDERID, to one or several articles that have been ordered with their quantities.

Roughly, the query that will print invoices will look like this:

SELECT	<some information from CLIENTS>
,<some information from ORDERS>
,<some information from ORDER_DETAILS>
,<some information from ARTICLES>
FROM	ARTICLES A
,ORDER_DETAILS OD
,CLIENTS C
,ORDERS O
WHERE	<probably some condition on a date or a status>
AND	O.CUSTOMERID = C.CUSTOMERID
AND	O.ORDERID = OD.ORDERID
AND	OD.ARTICLEID= A.ARTICLEID
ORDER BY O.ORDERID

If the business is booming, we can expect the number of customers to, for example, triple in the next six months. Intuitively, many people would assume that we’ll need three times the current time to print the invoices. Very optimistic indeed. If we simply consider the ORDER BY, it will imply a sort, the cost of which increases faster than the number of lines to sort. In fact, for three times as many rows returned, we may expect the sort to take between 3.3 and 4 times as long (what you get on average with the best sorting algorithms), depending on the initial table size. (The bigger the table, the smaller the multiplicative factor.)

The exact execution plan depends, of course, on a number of factors. Let’s assume that the date or status criterion uses a non-unique index scan, and that the order numbers grow proportionately to the number of customers. We can expect the index scan to cost three times as much as it does now in six months. But if we have nested loops, then we’ll have to fetch the client information for each order. Such information represents a negligible cost increase, because we’re using a primary key. The search on ORDER_DETAILS will use a non-unique index and will also be expected to take longer than it does now. How much longer is difficult to calculate -- especially if the number of articles has increased and the average order is referring to more articles than we’re currently using. With the loop, we’ll have a multiplication of all factors.

This may not be the best example, because the printer will certainly fall behind. But the example underlines the fact that relationships between volume increases and query execution times may not be as simple as they seem at first sight. Note that here we are just considering what happens at the Oracle level and totally ignoring probable increased memory requirements to keep I/Os at a sustainable level, as well as a number of elements that will require a hardware upgrade: more mass storage, more time required to back up the system, and additional applications mushrooming on the database, such as decision support systems (DSSs). End users will expect response times to remain stable, smarter users will expect something three times slower, but the reality will probably be something five times slower. For some long-running queries, such delays may be unacceptable.

The Value of Proper Queries

When it comes to SQL statement tuning, we encounter two schools of thought. The first is characterized by apprentice tuners fresh from an Oracle tuning course in a hunt for full table scans of big tables. At first glance, this is a sensible approach. But full scans, even of big tables, don't always happen to be the worst option. This is what the more pragmatic second school recognizes; it attempts to minimize cost, which is usually best measured in terms of number of Oracle block accesses required to process the query (logical reads). The question is, will today’s finely tuned statement still be finely tuned tomorrow?

Recently, someone submitted an optimized query for approval to me on which a developer tried a number of "hints" under a SQL workbench tool. The cost had indeed been divided by 100 between the default RULE mode on the database and the /*+ ALL_ROWS */ hint. Quite impressive, but the supposedly optimized access plan was using a full scan on a 12,000-row table. The problem was that this table could be expected to grow four or five times over the next year. In fact, there was a search on a fairly selective criterion with a misplaced lpad(rtrim(...)), preventing use of the second (and most selective -- another mistake) column in a concatenated index. Rewriting the query properly saw Oracle using the index (without any hint), thereby reducing the cost by an extra factor of two -- a result that was much less sensitive to future volume increases. Blinded by his already impressive results, the programmer reworking the query had totally overlooked all this. Yet this scenario is certainly not the worst you can encounter.

Tuning for the Future

Frequently, most people trying to optimize queries initially look for full scans. I certainly don't object to this, but suggest a different way to look at trace files:

  • Scrutinize your sorts. Many sorts are functional, but I’m always amazed at how frequently I encounter the number of SELECT DISTINCT that imply sorts. As a rule of thumb, you can consider that a SELECT DISTINCT with a join simply proves that the programmer was eager to get his or her program working and compensated for sloppy join conditions. (Never heard of subqueries?) More often than not, people also use ORDER BY clauses to return data in application windows that are able to sort their contents themselves. Why do you want to add more load to the poor server?
  • Keep your algorithms simple. There’s a strong tendency among programmers -- undoubtedly under the spell of the SQL language -- to write the SQL query that has everything including the kitchen sink, bringing the server to its knees. Cleverly used, pin-pointed queries and intelligent programming usually achieve much better results. If you have any doubts, learn from people who know what relational databases are about and just trace the execution of a DDL statement (a CREATE, ALTER or DROP) that does a number of updates to the data dictionary. If there’s something critical to Oracle performance, it’s data dictionary access -- simple, basic, and efficient statements. It’s not by chance that most data dictionary information is cached in the SGA. Many critical processes can have most of the work done by the client rather than server side. Oh, and I’m still waiting for integrity constraints on the Oracle data dictionary.
  • Get rid of useless indexes. Ten years ago, most Oracle performance tuning was about adding indexes at the right places. The wind changed about five years ago, and since then, much more emphasis has been placed on removing (or concatenating the other way round) indexes of dubious selectivity. Part of the blame probably belongs with foreign keys and with the well-known fact that when you don’t index a foreign key, the referenced table is locked in exclusive mode when you update the referencing table. As a result, many database design tools systematically index foreign keys (or this is included in the company’s Oracle design norms as mandatory). I’ve met a number of cases where referencing and referenced tables had to be updated concurrently, but this was certainly not the majority of cases.
  • Pay attention to nested scans. After you’ve thrown out useless indexes, address nested scans - not only table scans, but also index scans. On an enormous table, indexes are rarely light weight themselves, even if they seem small by comparison. A full index scan, and even a range scan (even if much less costly than a scan on the table itself) also has its cost that increases with the number of rows. When we have this kind of access at different nesting levels, the multiplicative factor can cost us dearly in the future.
  • Value full scans. Ordinary full scans scare most beginners; however, such scans are quite efficient, especially with the parallel query option. It’s great if you can find a faster way to do the same thing, but with full scans, at least the cost grows in line -- not faster -- than the number of rows. This is a far lesser evil.

Planning Ahead: Size Sensitivity Index

I advocate the computation of a size sensitivity index. For each table you access when processing the statement and that embodies an entity (as opposed to relationship), the approximate ratio of the number of logical reads needed to process the particular table’s statement doubles in size to the current number of logical reads.

You best measure a size sensitivity index through testing, but an experienced programmer can make a decent estimate with the execution plan. You can expect any step that refers to the table, one of its indexes, or a table that references it (or one of its indexes) and contains the word "scan" to double in cost, as well as merge. However, you can expect accesses through unique indexes (primary keys and others) or hash functions to stay the same. The cost of nested loops will be the product of the cost of the upper level multiplied by the sum of the costs of lower levels. You can derive sort cost increases from this table:

RowsRatio
102.6
1002.3
10002.2
100002.2
100000 2.1
1000000 2.1
10000000 2.1


Now comes the interesting part: How can we interpret this size sensitivity index?

  • < 1 If the sensitivity index of the statement for a given entity is <1, you missed something in your computations.
  • 1 means the query is totally insensitive to a size increase.
  • Between 1 and 2 is inclusive: The query is sensitive to a size increase, but things will be linear at worst, but under control. Here, the scalability argument has full weight.
  • > 2 Red zone! The cost increases faster than the number of rows and could (if this number of rows is due to increase) get out of hand. As a corollary, it means that the ratio is also a function of the number of rows and can't be easily extrapolated.

Knowing the relationship between queries and an increase in the number of rows lets us predict response times and user satisfaction confidently when this new system has reached cruising speed in production.

Don’t Underestimate Predictability
With performance, the key is predictability. I favor a slower (within reason) query that I can rely on for consistent response times. If we know how things will or are likely to evolve, it makes things easier to prepare for a software or hardware upgrade -- or to look for another job.

Stéphane Faroult has been administering and tuning Oracle databases for more than 12 years. He’s currently working with Oriole Corp., a provider of Oracle database administration and performance tools. You can reach him at sfaroult@oriolecorp.com.