1 / 18

Optimization of Sequence Queries in Database Systems

Optimization of Sequence Queries in Database Systems. Reza Sadri Carlo Zaniolo reza@cs.ucla.edu zaniolo@cs.ucla.edu sadri@procom.com Amir Zarkesh Jafar Adibi azarkesh@u4cast.com jabibi@u4cast.com.

adwayne
Télécharger la présentation

Optimization of Sequence Queries in Database Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Optimization of Sequence Queries in Database Systems Reza Sadri Carlo Zaniolo reza@cs.ucla.edu zaniolo@cs.ucla.edu sadri@procom.com Amir Zarkesh Jafar Adibi azarkesh@u4cast.com jabibi@u4cast.com

  2. Time series Analysis • Many Applications: • Querying purchase patterns for marketing • Stock market analysis • Studying meteorological data • What’s needed: • Expressive query language for finding complex patterns in database sequences • Effcient and scalable implementation: Query Optimization

  3. State of The Art • ADT (e.g.. Informix Datablades): Not flexible enough, no Optimization • SEQ: Enhanced ADTs (e. g. sets and sequences) with their own query language • SRQL: Adding sequence algebra operators to relational model

  4. SQL-TS • A query language for finding complex patterns in sequences • Minimal extension of SQL—only the from clause affected • A new Query optimization techniqe based on extensions of the Knuth, Morris & Pratt (KMP) string-search algorithm

  5. SQL-TS: Example • Having a table quote (name, date, price) of stock prices, find all the stocks that went up by 15% or more one day and went down by 20% or more the next day and retrieve the change in the value of the stock. SELECT X.name, (Z.price -X.price) AS price_change, X.date AS date FROM quote CLUSTER BY name SEQUENCE BY date AS (X, Y, Z) WHERE Y.price > 1.15 * X.price AND Z.price < 0.8 * Y.price

  6. SQL-TS: Example Data grouped but not sorted sorted in each group

  7. Optimized string search:KMP Consider text array text and pattern array p: i 1 2 3 4 5 6 7 8 9 10 11 text[i] a b a b a b c a b c a j 1 2 3 4 5 6 pattern[j] a b a b c a ­ • After failing, use the information acquired so to: • - backtrack to shift(j), rather than i+1, and • - only check pattern values after next(j) But in SQL-TS we have general predicates & star patterns

  8. Equality predicates: KMP Find companies whose closing stock price in three consecutive days was 10, 11, and 15. SELECT X.name FROM quote CLUSTER BY name SEQUENCE BY date AS (X, Y, Z) WHERE X.price =10 AND Y.price=11 AND Z.price=15

  9. i – j + 1 i – j + shift(j) + 1 i - j + shift(j) + next(j) i Input shift(j) + 1 shift(j) + next(j) 1 Pattern j next(j) j - shift(j) Shifted Pattern 1 shift(j) shift and next • Success for first j-1 elements of pattern. Failure for jth element (when input is at i) • Any shift less than shift(j) is guaranteed to lead to failure, • Match elements in the pattern starting at next(j)

  10. Optimal Pattern Search (OPS) Search path for naive algorithm vs optimized algorithm:

  11. Matrices q and j: Input tested on pj is now tested against pk pj succeeded: pj failed: Combing values of these lower triangular matrices ( j ³ k), We derive the values of next(j) and shift (j)

  12. STAR Patterns SELECT X.NEXT.date, X.NEXT.price, S.previous.date, S.previous.price FROM quote CLUSTER BY name, SEQUENCE BY date AS (*X, Y, *Z, *T, U, *V, S) WHERE X.name='IBM‘ AND X.price > X.previous.price AND 30 < Y.price AND Y.price < 40 AND Z.price < Z.previous.price AND T.price > T.previous.price AND 35 < U.price AND U.price < 40 AND V.price < V.previous.price AND S.price < 30

  13. Handling Star Patterns Same input, Transitions on Original Pattern vs. Transitions on Pattern after the index set back j-k 21  31  32   41  42  43   Example: Elements j and k are star predicates and jk is U: U  j,k+1  j+1,k j+1,k+1

  14. Possible Transitions from jk

  15. Implication Graph

  16. *U (less than 2% change) *T *V *Y *R *Z (less than 2% change) *W (less than 2% change) FSM Simulation used to Optimize Star Patterns Relaxed Double Bottom: • Only considering increases and decreases that are more than 2%

  17. Relaxed Double Bottom: Ninty fold improvement

  18. Conclusions • Disjunctive queries, partial ordered domains, aggregates also treated in this approach • Old applications—more power & flexibility than Datablades ADTs of commercial DBMSs • Ongoing implementation, by building on the user-defined aggregates supported in AXL.

More Related