1 / 30

CS145: Intro to Database Management Systems

CS145: Intro to Database Management Systems. Lecture 1: Course Overview. “data is the future” – my cab driver in Pittsburgh. Outline. Introduction Administrative stuff What is a database and why do we use it? Summary. Big Data Landscape… Infrastructure is Changing. New tech.

eunice
Télécharger la présentation

CS145: Intro to Database Management Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS145: Intro to Database Management Systems Lecture 1: Course Overview

  2. “data is the future” – my cab driver in Pittsburgh

  3. Outline • Introduction • Administrative stuff • What is a database and why do we use it? • Summary

  4. Big Data Landscape… Infrastructure is Changing New tech. Same Principles. http://www.bigdatalandscape.com/

  5. Why should you study databases? Mercenary: make more $$$ • Startups need DB talent right away = low employee # • Massive industry… Intellectual: • Science: data poor to data rich • No idea how to handle the data! • Fundamental ideas to/from all of CS: • systems, theory, AI, logic, stats, analysis…. Many great computer systems ideas started in DB.

  6. What this course is (and is not) Discuss fundamentals of data management • How to query databases, design databases, build applications with them. • Not how to be a DBA or how to tune Oracle 12g. We won’t get to cover the principles of how to build database management systems.  see 245, 345, and 346.

  7. Who we are… Instructor (me) Chris Ré (sounds like Ray) • Faculty in the InfoLab • Research: theory of data processing, statistical analytics, and machine reading. • chrismre@cs.stanford.edu • Office hours: MW 10-11 in Gates 433

  8. Course Assistants (CAs) ! Remember: CAs are people (students) too!

  9. Joy Kim Angela Gong Sam Keller Kevin McKenzie Curran Kaushik VienDinh Duong Michael Fitzpatrick Firas Abuzaid Vishnu Sundaresan Raven Jiang Gina Pai Yifei Huang Patrick Harvey

  10. Communication w/ Course Staff • Piazza, • Course mailing list, • Office hours, and • By appointment! All are (or will be soon) listed on the course page!

  11. Course Logistics cs145.stanford.edu

  12. Course Elements This class is semi-flipped: • Learn from your classmates! • Some classes are flipped, some are not… • The Red F is your guide! • Attendance (10%) Lectures or Videos per week • Videos and Slides.

  13. Lectures Lecture slides cover essential material • You can (almost) always watch Jennifer instead! • Database Systems and Locking are new this time. Try to cover same thing in many ways: Lecture, lecture notes, homework, exams (no shock) • Attendance makes your life easier… • 8 lectures mandatory…must attend GUEST LECTURES!

  14. Graded Elements Attendance (10%) – 8 Classes. Problem Sets & EdX Questions (20%) • You can retake EdX until you get a perfect score. Programming project (20%) • Auction base. Up now! midterm & final exam (20%/30% of grade) All but the final assignment are due on Monday before class.

  15. What is expected from you • Attend lectures • If you don’t it’s at your own peril • Be active • Ask questions, post comments on forums • Do programming and homework projects • Start early and be honest. • Study for tests and exams.

  16. Now to databases...

  17. What is a DBMS? • A large, integrated collection of data • Models a real-world enterprise • Entities (e.g., Students, Courses) • Relationships (e.g.,Alice is enrolled in 145) A Database Management System (DBMS) is a piece of software designed to store and manage databases

  18. A Motivating, Running Example • Consider building a course management system (CMS): • students • courses • professors • who takes what • who teaches what Entities Relationships

  19. Data models • A data model is a collection of concepts for describing data • A schema is a description of a particular collection of data, using the given data model • The relational model of data is the most widely used model today • Main Concept: relation: essentially, a table • Every relation has a schema describing types, etc.

  20. “Relational databases form the bedrock of western civilization” – Bruce Lindsay, IBM Research

  21. Modeling the CMS • Logical Schema • Students(sid: string, name: string, gpa: float) • Courses(cid: string, cname: string, credits: int) • Enrolled(sid: string, cid: string, grade: string) Relations Students Courses Enrolled

  22. Other Schemata… • Physical Schema: describes data layout • Relations as unordered files • Some data in sorted order (index) • Logical Schema: Previous slide • External Schema: (Views) • Course_info(cid: string, enrollment: integer) • Derived from other tables for “authorized users” Administrators Applications

  23. Data independence • Applications do not need to worry about how the data is structured and stored Logical data independence protection from changes in the logical structure of the data Physical data independence is protection from the physical layout changes NB: One of the most important reasons to use a DBMS

  24. Challenges with Many Users CMS application serves 1000s+ of users • Security: Different users, different roles • Performance: Need to provide concurrent access • Consistency: Concurrency can lead to update problems • Disk/SSD access is slow, DBMS hide the latency by doing more CPU work concurrently DBMS allows user to write programs as if they were the only user.

  25. Transactions • Key concept is a transaction: an atomic sequence of db actions (reads/writes) • Transactions leave the DB in a consistent state • Users may write integrity constraints, e.g., each course is assigned to exactly one room • But, DBMS does not understand the real semantics of the data – consistency burden is still on the user! Atomicity: An action either completes entirely or not at all

  26. Scheduling concurrent transactions • DBMS ensures that execution of {T1, … Tn} is equivalent to some serial execution • Locking: Before reading or writing transaction reqs a lock from DBMS, holds until the end • Idea: If Ti writes an item x and Tjreads x then Ti, Tjconflict • only one winner gets the lock. • loser is blocked until winner finishes What if Ti asks for X before Tj and Tj asks for Y before Ti? Deadlock! One is aborted…

  27. Ensuring Atomicity • DBMS ensures atomicity all-or-nothing property – even if a transaction crashes! Idea: Keep a log of all writes the DB does Write-ahead log (WAL): Before a change is made, the corresponding log entry is forced to disk Idea: After a crash, partially executed transactions are undone using the log NB: Thanks to WAL, if log entry not present – then its not applied to the DB

  28. More details about the log • The following actions are in the log: • Ti writes an object: old value and new value • Ti commits/aborts • Log records chained by Xact ID so easy undo • Log is on “stable” storage All log maintenance and concurrency handled transparently by DBMS

  29. Friends of Databases (people made happy) • End users and DBMS vendors • Reduces cost and makes money • DB application programmers • e.g., smart webmasters • Database administrators (DBA) • Designs logical/physical schema • Handles security/authorizatino • Tuning, crash recovery, and more… Must understand DB internals

  30. Summary of DBMS • DBMS used to maintain, query, and manage large datasets. • Provides concurrency, recovery from crashes, quick application development, integrity, and security • Key abstractions give independence • DBMS R&D is one of the broadest, most exciting fields in CS. Fact!

More Related