1 / 8

Database Systems

Database Systems. What is “Database systems” research?. Input? large data sets, large files, relational tables How? Fast external algorithms; RAM-efficient data structures at two storage levels Efficiency? Desirable O(n) I/O

vesta
Télécharger la présentation

Database Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Database Systems

  2. What is “Database systems” research? • Input? large data sets, large files, relational tables • How? Fast external algorithms; RAM-efficient data structures at two storage levels • Efficiency? Desirable O(n) I/O • Hardware? Small computer, single server, parallel DBMS server, parallel cluster; 1 disk, RAID • Infrastructure? DBMS, parallel system • Boring? Theory+programming

  3. Database systems research today • Transaction processing? done • Efficient querying? done • Fast external algorithms? Simple tasks. • Parallel computation? Well proven DBMS shared-nothing, but still many challenges (big data). • Exploiting new hardware? Difficult, low level • Analyzing? Most difficult: data mining, statistics • Future? Big data

  4. DB Systems involves Core CS research:Theory+Programming • Theory we use: • Time complexity, I/O cost models • Large data structures; especially external • Relational model is here to stay • Multivariate statistics, machine learning, discrete math • Numerical methods: linear algebra, optimization • Compilers: parsing/compiling/optimizing code; recursion • Programming (even some hacking): • Systems in a broad sense • Languages: C, C++; efficiency, pointers, legacy systems code; Java, C# mainly for portability • Numerical libraries like LAPACK, OS thread libraries • DBMS • SQL • UDFs • API with C, C++, C#

  5. Research topics • GOAL: Integrating statistical and machine learning algorithms with a DBMS (external algorithms, queries, UDFs) • Difference with machine learning algorithms: Size, external algorithms (small RAM), queries, low level optimization, generally simpler models • Main topics by students: • Zhibo Chen: OLAP cubes, parametric statistical tests, cube ops on flash memory • Mario Navas, Naveen Mohanam: Singular Value Decomposition for PCA and ML Factor Analysis, data summarization on multicore CPUs • Carlos Garcia-Alvarado: keyword search across docs and db, ranking, query recommendation • Sasi Pitchaimalai: Bayesian classification, multithreaded summarization • Wellington Cabrera: stochastic search variable selection on high dimensional data, SVD on high-d data • David Matusevich: Hybrid EM and MCMC mixture models on large data sets, database transformations for data mining

  6. Representative problems Finding predictive association rules OLAP cubes Cluster, PCA and regression Bayesian classification

  7. Why is our database systems research “cool”? • Theory+Programming • Optimization, O(f(n)), systems (external data structures, discrete math, compiler, OS) • Goes from hardware-level stuff (multi-core, cache memory), to high-level query optimization in SQL • Database systems techniques are used in search engines like Google and Yahoo (and vice-versa) • DBMS technology used everywhere

  8. Why join DBMS group? • Balance between theory (math) and programming • We target “DB systems” conferences: ACM SIGMOD and “IR/DM” conferences ACM CIKM (IR+DB+DM) • Mature and stable CS research area • Job/internship: many opportunities in DBMS and search engines; Job security on any large company • Visit my web page, DBLP. Google “Ordonez SQL”

More Related