1 / 30

Query Processing in Database Systems 資料庫系統之查詢處理

Query Processing in Database Systems 資料庫系統之查詢處理. 國立屏東商業技術學院 資訊科技系所. Jun Wu ( 吳卓俊 ) Email: junwu@npic.edu.tw. Department of Information Technology. National Pingtung Institute of Commerce. February 27, 2008. Outline. Part I: Introduction to Database Systems Database Query Processing

duff
Télécharger la présentation

Query Processing in Database Systems 資料庫系統之查詢處理

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Query Processing in Database Systems資料庫系統之查詢處理 國立屏東商業技術學院 資訊科技系所 Jun Wu (吳卓俊) Email: junwu@npic.edu.tw Department of Information Technology National Pingtung Institute of Commerce February 27, 2008

  2. Outline • Part I: Introduction to Database Systems • Database • Query Processing • Part II: Multiprocessor QEP Scheduling Problem • Query Execution Plan (QEP) • Critical-Path-Based Approach

  3. Part I Introduction to Database Systems

  4. Database • Database(資料庫) • A collection of information organized in such a way that a computer program can quickly select desired pieces of data. • It is managed by a powerful software for creating and managing large amounts of data efficiently and allowing it to persist over long periods of time, safely.

  5. Database Systems • Database Systems (資料庫系統) • Allow users to create new databases and specify their structures. • Give users the ability to query the data and modify the data. • Support the storage of very large amounts of data over a long period of time, keeping it secure from accident or unauthorized use. • Control access to data from many users concurrently.

  6. Popular Database Systems • Commercial Products • Oracle Database – http://www.oracle.com • IBM DB2 – http://www.ibm.com/db2 • Microsoft – http://www.microsoft.com • SQL Server • Access • Open Source Projects • MySQL – http://mysql.com • PostgreSQL – http://www.postgreSQL.org

  7. Databases on the WWW • Google/Yahoo • Wikipedia(維基百科) • YouTube/無名小站 • SourceForge • UrMap/Google Map • Blog • Forum

  8. Data Models • Data Model(資料模型) • A theory or specification describing how a database is structured and manipulated. • Common Data Model • Hierarchical Model • Network Model • Relational Model • Entity-Relationship Model • Object-Oriented Model • Semistructured Model (Such as XML)

  9. Relational Data Model • Relational Data Model • Data are organized as a collection of relations which are two-dimensional table. • Relational Database (RDB) • A database system that based on the relational data model. • Today’s database system are almost relational databases. Edgar Frank (Ted) Codd

  10. Example 1 • Relations are tables. Their columns are headed by attributes, which describe the entries in the column. For instance, a relation named Accounts, recording bank accounts, their balance, and type might look like:

  11. Data Manipulation Language • Data Manipulation Language (DML) • A computer language used by users or computer programs to retrieve, insert, delete and update data in a database system. • The most popular DML is the structured query language (SQL). • Almost today’s database systems support SQL.

  12. Example 2 • Simple Queries in SQL • Find the balance of account 67890. • Find the savings accounts with negative balances.

  13. Query Processing • Query Processing is a sequence of procedures that a database will perform to answer the user query.

  14. Query in a high-level language (e.g., SQL) Physical Operator SCANNER, PARSER, and VALIDATOR Partial Order Intermediate form QUERY OPTIMIZER Query Execution Plan (QEP) RUNTIME DATABASE PROCESSOR Results of the Query Query Processing While most work on QEPs is on the derivation of “optimal” QEPs for queries, little work is done for the scheduling of QEPs subject to their partial order constraints.

  15. Part IIMultiprocessor QEP Scheduling Problem

  16. Index Nested Loop Join Index Scan Z Merge Join Sort Sort Table Scan X Table Scan Y Query Execution Plan (QEP) Virtual Root • Query Execution Plan (QEP) • PO : a set of physical operators • : a partial order on PO Index Nested Loop Join Nested Loop Join Index Scan Z Table Scan Y Table Scan V Sort Table Scan W • QEP Structures • Tree-Structured • DAG-Structured

  17. Query Scheduling Multiprocessor QEP Scheduling (MQEPS) Problem • MQEPS problem is NP-hard. • This problem is equivalence to the multiprocessor precedence constrained scheduling (MPCS) problem. Objective: 3 Schedule a given set of physical operators to multiprocessor subject to their partial orders to minimize the schedule length. 2 4 5 1 3 4 2 4 3 P1 P2 … PM schedule length

  18. Best Results • Homogeneous multiprocessor environments • List scheduling algorithm [1] • Approximation ratio: 2-1/M • Heterogeneous multiprocessor environments • Speed-based list scheduling algorithm[2] • Approximation ratio: O(logM) Pi Pj Pi Pj *: M is the number of processors in the system. [1] R.L. Graham, “Bounds on multiprocessing timing anomalies”, SIAM Journal on Applied Mathematics, 17:263-269, 1969. [2] Fabian A. Chudak and David B. Shmoys, “Approximation algorithms for precedence-constrained scheduling problems on parallel machines that run at different speeds”, Journal of Algorithms, 30(2):581-590, February 1999.

  19. Query Execution Plan Scheduling with I/O considerations An SMP Database System … Processor 1 Processor 2 Processor M Bus Shared Memory I/O Controller … Disk 1 Disk 2 Disk N I/O Subsystem Our Goal • Focus on the MQEPS problem with I/O considerations for symmetric multiprocessor (SMP) database systems • Propose a joint scheduling algorithm • To improve the operating parallelism of processors and I/O subsystems. • To provide approximation/competitive bounds

  20. processor w x y z I/O subsystem relevant set of data pages Read data pages into the main memory Motivation: I/O Considerations a • Joint scheduling problem • I/O activities are the bottleneck of system performance[3] • Join-based QEP scheduling[4] • Each join requires exactly two data pages • Few results with approximation bounds are known a w x y z [3]: M. Murphy and M.-C. Shan, “Execution Plan Balancing”, In Proceedings of the IEEE International Conference on Data Engineering (ICDE), pp. 698-706, 1991. [4] M. Murphy and D. Rotem, “Processor Scheduling for Multiprocessor Joins”, In Proceedings of the IEEE Data Engineering Conference (ICDE), pp. 140-148, 1989.

  21. MQEPS-I/O Problem Problem Definition • Given a QEP with its relevant data pages, the scheduling problem is to find a compatible schedule such that the schedule length is minimized. Strong NP-hard with the structure of the QEP being DAG Compatible schedule • Be consistent with the partial order of physical operators • Do not violate the environmental constraints (e.g., the • number of processors in the system)

  22. A Compatible Schedule Example a b c e f d g h i j k l m n o schedule length = 20 P1 b e a d c f P2 g I/O h i j k l m n o time 0 2 4 6 8 10 12 14 16 18 20

  23. Our Approach: Critical-Path-Based Scheduling (CPS) Algorithm INPUT: QEP PROCESSING: Assign a unique ordered number to each physical operator and data page according to the critical path rule. Ordered Number Generator (OrdGen) Algorithm Whenever a processor is available, schedule a ready physical operator with minimum ordered number to it. Whenever an I/O subsystem is available, read a unloaded data page with minimum ordered number into the main memory. Critical-Path-Based Scheduling (CPS) Algorithm OUTPUT: Processor Schedule I/O Schedule

  24. CPS Algorithm Priority-driven scheduling algorithm Priority = Ordered Number A CPS Schedule Example 15 15 OrdGen Algorithm a a 13 13 STEP 1: Locate the physical operator which has the maximum height. Assign a unique ordered number to every data page in its relevant set and itself in an increasing order. 10 10 b b c c 8 8 9 9 6 6 e e f f d d STEP 2: 2 2 g g h h i i j j k k l l m m n n o o 3 3 4 4 14 14 7 7 5 5 11 11 12 12 1 1 schedule length = 17 6 10 8 13 2 15 P1 b a g d e c 9 P2 f 1 3 4 5 7 11 12 14 I/O o h i l k m n j 0 2 4 6 8 10 12 14 16 18 20

  25. Tree-Structured DAG-Structured Properties of the CPS Algorithm Optimal Optimal M >=W Off-line 2 3 – 2/M M < W 3 – 2/M 3 – 2/M On-line M : the number of processors W : the width of a given QEP

  26. Performance Evaluation • Simulation Model of an SMP Database • Number of processors: 2,4,8,16 • I/O subsystem • QEP Scheduler (The OrdGen and the CPS algorithm) • Experimental QEP’s • A TPC-C benchmark database • Number of physical operators: 1,000-2,000 • Metrics • Performance Ratio: (CPS schedule length) (Optimal schedule length)

  27. upper bound CPS Experimental Results: Off-line Usage

  28. upper bound CPS Experimental Results: On-line Usage

  29. Query Execution Plan Scheduling with I/O considerations Conclusion • Scheduling of QEPs with I/O considerations • A physical operator can be executed only if all of its relevant data pages are read into main memory. • A more general model of physical operators is considered. • A joint scheduling of physical-operator executions over multiprocessors and activities over the I/O subsystem is needed. • Critical-Path-Based Scheduling (CPS) Algorithm • The rationale behind the design of the CPS algorithm is to overlap the I/O activities for data access and the executions of physical operators to reduce the schedule length. • The approximation ratios of our proposed algorithm are shown. • The capability of the proposed algorithm is verified by a series of simulation experiments.

  30. -The End- ~Thank You Very Much~

More Related