1 / 15

Query optimization in relational DBs

Query optimization in relational DBs. Leveraging the mathematical formal underpinnings of the relational model. Pre-processing a query . Convert SQL to algebra Create a binary parse tree Assign execution fragments to each interior node. Optimizing a query: stage 1.

xannon
Télécharger la présentation

Query optimization in relational DBs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Query optimization in relational DBs Leveraging the mathematical formal underpinnings of the relational model.

  2. Pre-processing a query • Convert SQL to algebra • Create a binary parse tree • Assign execution fragments to each interior node

  3. Optimizing a query: stage 1 • Manipulate parse tree to create new, but equivalent trees that will be faster independently of the properties of the database • Select a handful that would most likely be the fastest

  4. Stage 2: collect information about the DB’s physical state • Size of tables being manipulated (in both rows and space) • Sorting order of tables • Location of indices on tables • Distribution of PKs and FKs (to estimate hit ratio on joins) • Hit ratio of attributes referenced by 1-operator commands (select and project)

  5. Stage 3: transform parse tree into a tree that is likely to be faster • Heuristics: unary operations • Push unary operations down stream • Bundle unary operations on common leaves • Goal is to perform unary operations together and before binary operations, to make binary operations faster • Heuristics: joins • Most join algorithms tree the two operands differently • Make use of sort orders and index locations

  6. Stage 4: choose implementations of binary operations • If both tables sorted by desired attributes, do a merge join • If one table has an index, to a nested join where the inner table is accessed via the index and the outer one is accessed sequentially • Order unary in-line operations so that only necessary attributes are funneled upstream

  7. Stage 4, Notes on joins • If no indices exist and the tables are not sorted the way we want, build an index on one, then do a sort merge or a nested join • If both tables have indices, use the one for the shortest table • If the hit ratio of a join will be very small, do it as early as you can

  8. Stage 5: consider multiple execution plans • Estimate the costs of various execution plans and pick the best one • Balance time spent optimizing versus time spent executing • For one time only transactions, pick a plan fast

  9. Other important issues • Use indices that are fast with respect to pulling in blocks from the disk • Wide, but shallow trees, like b-trees • An index on a sorted attribute can be interpolated • Hash indices good for key-based selections, but are not as versatile as b-trees

  10. A note on b-trees • They are dynamic and therefore do not suffer from the migrating artificial primary key problems • Since they are dynamic, the tradeoff is that they are slightly less space efficient and often lead to an extra layer in the tree

  11. Important: manipulating parse trees • There is a sort of algebra of parse tree manipulation • For example, A join B and B join A are equivalent • (A join B) join C = A join (B join C) • If a and b are used as join attributes, we can strip away all but those and the ones that will be chosen by a later projection or selection

  12. More notes on optimization • The smallest parse tree might not be the best • Sometimes we add unary operations to make the operands of binary operations smaller • If a unary operation will have a very small hit ratio, do it early • The cost of a complete execution plan is complex to calculate • Statistics of the data size and value distribution • Choice of parse tree • Choice of execution plan for each tree • !! It is not a linear process

  13. In light of modern database applications • Often the cost of optimization is not worth it • Many one of a kind transactions • Database with very small transaction load, perhaps do to multiple servers and hard drives • Tables are small and queries are simple, and so almost any execution plan is good • This is a key reason by traditional relational database servers are often far less efficient than no-SQL DBs – because they were built to efficiently run a high volume of similar transactions that involve multiple tables

  14. Making use of parallelism • SQL, compared to various non-declarative languages, is very east to parallelize • But communication costs can be very high if the servers are not co-located • Often, disk arrays are the chosen form of parallelism

  15. Optimization in the real world today • Languages and queries are often not set-based • The code in a query can be very imperative • Binary operations might be in the minority • Databases can be so huge that optimization has to be heavily biased toward certain kinds of queries

More Related