1 / 29

Advanced Database Techniques

Advanced Database Techniques. Martin.Kersten @ cwi.nl Stefan Manegold@cwi.nl Sandor Heman @ cwi.nl Jennie Zhang @ cwi.nl Romulo Goncalves @cwi.nl. Administrative details. The website evolves as during the course Exam material is marked explicitly Lab work deadlines are strict

rahrens
Télécharger la présentation

Advanced Database Techniques

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Advanced Database Techniques Martin.Kersten @ cwi.nl Stefan Manegold@cwi.nl Sandor Heman @ cwi.nl Jennie Zhang @ cwi.nl Romulo Goncalves @cwi.nl

  2. Administrative details • The website evolves as during the course • Exam material is marked explicitly • Lab work deadlines are strict • Email is the preferred way to communicate • Tomorrow the assistants will be available in person between 11:00-12:00, room REC-P.123

  3. Relational systems • A database system should simplify the organization, validation, sharing, and bookkeeping of information • Prerequisite knowledge • Relational data model and algebra • Data structures (B-tree, hash) • Operating system concepts • Using a SQL database system • What is your practical experience? [Ruby on Rails expertise needed]

  4. Applications • Bread-and-butter applications? • Web-shop • Banking systems • Inventory systems • Production systems • Shopping systems • Government systems • Health systems • Multimedia systems • Science systems …

  5. Advanced Applications • Bread-and-butter applications ??? • Banking systems • What happens if you install a stock trading system which should handle >100K transactions/minute • How to derive trading advice using compute intensive applications • How to warn thousands of users about their trading opportunity • …. Need for parallel, distributed main-memory database technology…

  6. Advanced application requirements • Bread-and-butter applications • Inventory applications • How to install a battlefield inventory systems • How to deliver goods just in time? • How to keep track of moving objects/persons ? • … need for sensor-based database support and RFID tags … need for a new DBMS ?…

  7. Advanced Applications • Production systems • How to interact with component suppliers • How to manage the production workflow • How to avoid bad production steps • How to maintain a database with 12000 tables (SAP) • … need for interoperability between autonomous systems… datamining and knowledge discovery…

  8. Advanced Applications • Health information systems • How to monitor your health over 30 years • How to enable quick response to a heart attack • …need for interoperable database systems …

  9. HELP The Ambient Home

  10. HELP 911 called The Ambient Home

  11. MonetDB DataCell

  12. 911 called MonetDB DataCell A Shared Tuple Space using an SQL DBMS nucleus

  13. Advanced applications • Government systems • Security • Biometric data management issues, finger/image matching • Public safety • Forensics, manipulate complex objects using proprietary algorithms • …need for extensible database technology…need to support unstructured data…

  14. Advanced Applications • Science systems • The new accelerator in CERN • how to handle >1PTByte files • The Sloan Digital Skyserver schema is 200 pages and the catalogued data 2.5Tb • How to query this efficiently • ..need for P2P and … a novel way to organize data…

  15. LOFAR central processor specs • Streaming Data • Input: 320 Gbit/s • Internally within correlator: 20 Tbit/s • Into storage: 25 Gbit/s = 250 TByte/day • Final products: 1-3 TByte/day • High Performance Computing • Correlation: 15 Tflops • Pre processing and filtering: 5 Tflops • Off-line processing (calibration, analysis): 5-10 Tflops • Visualisation, control, scheduling etc: 2 Tflops • Storage • On-line temporal storage: 500 TByte • Archive: PByte range of data stored in Grid

  16. Technological challenges • Data is often not structured as tables • XML and XQuery • Data does not always fit on one system • Distributed and parallel databases • Querying is more like world-wide searching • Continuous and streaming queries • A database tells more than facts • Datamining and knowledge discovery

  17. Code bases • Database management systems are BIG software systems • Oracle, SQL-server, DB2 >1 M lines • PostgreSQL 300K lines • MySQL 500 K lines • MonetDB 200-800 K lines • SQLite 40K lines • Programmer teams for DBMS kernels range from a few to a few hundred

  18. Performance components • Hardware platform • Data structures • Algebraic optimizer • SQL parser • Application code • What is the total cost of execution ? • How many tasks can be performed/minute ? • How good is the optimizer? • What is the overhead of the datastructures ?

  19. Not all are equal

  20. Gaining insight • Study the code base (inspection + profiling) • Often not accessible outside development lab • Study individual techniques (data structures + simulation) • Focus of most PhD research in DBMS • Detailed knowledge becomes available, but ignores the total cost of execution. • Study as a functional black box • Analyse a small application framework

  21. The Jack The Ripper Project • Study the snippet of the database technology and design an XQuery and SQL application • What is the schema? • What are the queries? • What are unorthodox solutions?

  22. Learning points • My poor knowledge on relational database? Read the chapters on SQL and relational algebra. Knowledge on data structures comes in handy. • Database systems are much more than administrative bookkeeping systems

  23. Learning points • Advanced application challenge the technology provided by a DBMS • Many techniques do not easily scale in size, complexity, functionality • Effectiveness of a DBMS is determined by many tightly interlocked components

More Related