|
Oracle Database Performance: Scalability, Myths, and RealitiesEnsuring that todays finely tuned statement stays that way tomorrow is the true key to scalable performanceBy Stéphane Faroult Project managers "sell" architectures, hardware, and software based on Oracle on the strength of scalability. In fact, crying "scalability" is a great way to get upper managements approval and dodge its fears of high costs and business risks. Twice as many users in the future? No problem -- with our bright, shiny multiprocessing architecture, we can start twice as many processes, crow these project managers. And if the hardware runs out of steam, we'll add more memory and more CPUs. Volumes are going to increase several fold? No problem, they reply, we'll add more disks, more controllers, more memory, and by doing things in parallel, we'll maintain the excellent response times that so greatly impressed you in the demo. But what will really happen? The number of users and volume will increase, more powerful hardware will be needed, and performance will sink to an abysmally low level nonetheless. Crisis, commando operation, a few heads roll, and somehow, because the business must go on, project managers will keep things on track. But at what price? Indeed, Oracle has strong features in terms of scalability, but that scalability doesnt come simply from using Oracle. Its critical to be careful about how you write your queries. The False Linear Hypothesis When talking about scalability, most people implicitly assume some linearity that unfortunately happens to be more the exception than the norm; in fact, were biased because we have trouble appreciating differences that arent linear; however, things are rarely linear. Many people will think automatically that a rectangle four times smaller than another one is only two times smaller. Most often, situations get worse a lot quicker than you expect. Lets use an Oracle example to illustrate the nightmare of every booming business I know: billing. Let's say you have a mail order business and you have a batch program running every night to print invoices. We can imagine a very basic and classical database design in which we have two reference tables, Roughly, the query that will print invoices will look like this: SELECT <some information from CLIENTS> ,<some information from ORDERS> ,<some information from ORDER_DETAILS> ,<some information from ARTICLES> FROM ARTICLES A ,ORDER_DETAILS OD ,CLIENTS C ,ORDERS O WHERE <probably some condition on a date or a status> AND O.CUSTOMERID = C.CUSTOMERID AND O.ORDERID = OD.ORDERID AND OD.ARTICLEID= A.ARTICLEID ORDER BY O.ORDERID If the business is booming, we can expect the number of customers to, for example, triple in the next six months. Intuitively, many people would assume that well need three times the current time to print the invoices. Very optimistic indeed. If we simply consider the The exact execution plan depends, of course, on a number of factors. Lets assume that the date or status criterion uses a non-unique index scan, and that the order numbers grow proportionately to the number of customers. We can expect the index scan to cost three times as much as it does now in six months. But if we have nested loops, then well have to fetch the client information for each order. Such information represents a negligible cost increase, because were using a primary key. The search on This may not be the best example, because the printer will certainly fall behind. But the example underlines the fact that relationships between volume increases and query execution times may not be as simple as they seem at first sight. Note that here we are just considering what happens at the Oracle level and totally ignoring probable increased memory requirements to keep I/Os at a sustainable level, as well as a number of elements that will require a hardware upgrade: more mass storage, more time required to back up the system, and additional applications mushrooming on the database, such as decision support systems (DSSs). End users will expect response times to remain stable, smarter users will expect something three times slower, but the reality will probably be something five times slower. For some long-running queries, such delays may be unacceptable. The Value of Proper Queries When it comes to SQL statement tuning, we encounter two schools of thought. The first is characterized by apprentice tuners fresh from an Oracle tuning course in a hunt for full table scans of big tables. At first glance, this is a sensible approach. But full scans, even of big tables, don't always happen to be the worst option. This is what the more pragmatic second school recognizes; it attempts to minimize cost, which is usually best measured in terms of number of Oracle block accesses required to process the query (logical reads). The question is, will todays finely tuned statement still be finely tuned tomorrow? Recently, someone submitted an optimized query for approval to me on which a developer tried a number of "hints" under a SQL workbench tool. The cost had indeed been divided by 100 between the default Tuning for the Future Frequently, most people trying to optimize queries initially look for full scans. I certainly don't object to this, but suggest a different way to look at trace files:
Planning Ahead: Size Sensitivity Index I advocate the computation of a size sensitivity index. For each table you access when processing the statement and that embodies an entity (as opposed to relationship), the approximate ratio of the number of logical reads needed to process the particular tables statement doubles in size to the current number of logical reads. You best measure a size sensitivity index through testing, but an experienced programmer can make a decent estimate with the execution plan. You can expect any step that refers to the table, one of its indexes, or a table that references it (or one of its indexes) and contains the word "scan" to double in cost, as well as merge. However, you can expect accesses through unique indexes (primary keys and others) or hash functions to stay the same. The cost of nested loops will be the product of the cost of the upper level multiplied by the sum of the costs of lower levels. You can derive sort cost increases from this table:
Now comes the interesting part: How can we interpret this size sensitivity index?
Knowing the relationship between queries and an increase in the number of rows lets us predict response times and user satisfaction confidently when this new system has reached cruising speed in production. Dont Underestimate Predictability Stéphane Faroult has been administering and tuning Oracle databases for more than 12 years. Hes currently working with Oriole Corp., a provider of Oracle database administration and performance tools. You can reach him at sfaroult@oriolecorp.com. |
||||||||||||||||||
|
|
||||||||||||||||||