1 / 13

Database and Data-Intensive Systems

Database and Data-Intensive Systems. Data-Intensive Systems. From monolithic architectures to diverse systems Dedicated/specialized systems, column stores Data centers, web architectures, distributed architectures From business data to all data

warrenr
Télécharger la présentation

Database and Data-Intensive Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Database andData-Intensive Systems

  2. Data-Intensive Systems • From monolithic architectures to diverse systems • Dedicated/specialized systems, column stores • Data centers, web architectures, distributed architectures • From business data to all data • Streaming and sensor data, semi-structured and unstructured data • Multidimensional data, temporal data, spatio-temporal data • Examples • Clustering of high-dimensional data • Tracking and continuous queries for moving objects • Mobile service infrastructure • Location privacy • Spatio-textural search/hyper-local web search • Multimedia similarity search • This is where much of our research “lives.”

  3. Staff • Ira Assent, associate professor • Christian S. Jensen, professor • Vaida Ceikute, Ph.D. student • Xiaohui Li, visiting Ph.D. student • NN, Ph.D. student • GEOCROWD – indoor positioning and services infrastructure • NN, Ph.D. student • GEOCROWD – spatial web objects • NN, Ph.D. student • eData – Anomaly Detection in e-Science • NN, Ph.D. student • Streamspin • NN, Ph.D. student • WallViz • NN, Ph.D. student • REDUCTION • NN, Ph.D. student • REDUCTION

  4. Graduate Course Portfolio: dDO • Data management for moving objects (Q3) • The course covers selected research advances in the general area of indexing and update and query processing for moving objects. • Moving object tracking • Specific indexing techniques • R-tree based indexing • B-tree based indexing • Techniques for the efficient handling of frequent updates • Techniques for range and k nearest neighbor query processing, including one-time as well as continuous queries

  5. Graduate Course Portfolio: MDDB • Multidimensional databases (Q4) • Selected techniques for the management of multidimensionally represented data • Multidimensional data and applications • Data warehouses and data mining • Similarity search and query processing • Efficient handling: indexing and associated query processing • Multistep similarity search • Indexing multidimensional data • Skyline query processing • Data mining techniques • Subspace clustering • Classification • Outlier detection

  6. Graduate Course Portfolio: Index • Indexing of disk-based data (Q1) • Indexing techniques for disk-based data for different types of data, as well as their support for queries and updates • General overview over indexes and query processing • Spatial indexing structures • Space partitioning indexing structures • Indexes for high dimensional data • Metric approaches • Special techniques for complex data types • Coming up for the first time this fall

  7. Graduate Course Portfolio: dDB2 • Database management systems (Q2) • The course aims to give the participants a solid conceptual foundation for making competent use of a database management system. • Logical and physical query optimization and query processing • Concurrency control techniques • Database tuning • Central concepts and techniques in relation to supporting temporal and multi-dimensional data • Coming up for the first time this fall

  8. Projects • Streamspin • Enable sites that are for mobile services what YouTube is for video • Easy mobile service creation and sharing • Advanced spatial and social context functionality • Be an open, extensible, and scalable service delivery infrastructure • MOVE • Knowledge extraction from massive data about moving objects • Cross-cutting activities, showcases, and evaluation • Representation of movement data and spatio-temporal databases • Analysis of movement and spatio-temporal data mining • WallViz • Collaborative analysis, joint decision making on wall-sized displays • scale to massive data collections • support ad-hoc queries • automatically provide entry points for analysis http://www.move-cost.info 8

  9. Projects (2) • GEOCROWD • Creating a Geospatial Knowledge World: • advance the state-of-the-art in collecting, storing, analyzing, processing, reconciling, and publishing user-generated geospatial information on the Web • REDUCTION • Reducing the environmental footprint of fleets of vehicles • Optimizing the behavior of drivers • Supporting eco-routing of vehicles • Enabling transparency in multi-modal transportation • eData • Robust analysis in the context of imperfect data in e-Science • Detect and correct anomalies effectively • on-line, interactive, lineage-preserving, and semi-automatic • Scalable algorithms

  10. How We Typically Work • We target some real problem that we find interesting. • We define the problem precisely. • We develop a solution that is typically a data structure or an algorithm, i.e., a concrete technique. • To evaluate, we build prototypes. • These are built for the purpose of studying the properties of our solutions. • We are often interested in performance, e.g., runtime, space usage, communication cost. • For some solutions we state formal properties that we then prove, e.g., the correctness of a particular technique • Brief: isolate and define problem, construct, then evaluate

  11. Example 1: Spatial Web Querying • Setting • Google: ~90 billion queries/month, ~20 billion with local intent. • We want to integrate exact locations of websites (for shops, bars, etc.) and users into web querying. • Queries • Results must match the query text and must be near the user. • Results of continuous queries must be updated as the user moves. • Challenges? • Support such queries with low computation cost on the server and • with little communication between server and client. • Solution • Invent an index that supports both text and location • Use a safe zone to reduce the communication between user and server for continuous queries

  12. Example 2: Fraud detection • There are billions of financial transactions per minute • How do we uncover fraud? • Scalability • In-time for reaction • Manageable results • Possible solution sketch • Identify attributes of suspicious transactions • Sort incoming transactions into a tree-structure of historic data • When processing time is up, output degree of suspicion based on similarity to valid or fraudulent historic data

  13. Interested? • Come talk to us! • We currently have M.Sc. and PhD. thesis openings

More Related