Database and Data-Intensive Systems

Database andData-Intensive Systems

Data-Intensive Systems • From monolithic architectures to diverse systems • Dedicated/specialized systems, column stores • Data centers, web architectures, distributed architectures • From business data to all data • Streaming and sensor data, semi-structured and unstructured data • Multidimensional data, temporal data, spatio-temporal data • Examples • Clustering of high-dimensional data • Tracking and continuous queries for moving objects • Mobile service infrastructure • Location privacy • Spatio-textural search/hyper-local web search • Multimedia similarity search • This is where much of our research “lives.”

Staff • Ira Assent, associate professor • Christian S. Jensen, professor • Vaida Ceikute, Ph.D. student • Xiaohui Li, visiting Ph.D. student • NN, Ph.D. student • GEOCROWD – indoor positioning and services infrastructure • NN, Ph.D. student • GEOCROWD – spatial web objects • NN, Ph.D. student • eData – Anomaly Detection in e-Science • NN, Ph.D. student • Streamspin • NN, Ph.D. student • WallViz • NN, Ph.D. student • REDUCTION • NN, Ph.D. student • REDUCTION

Graduate Course Portfolio: dDO • Data management for moving objects (Q3) • The course covers selected research advances in the general area of indexing and update and query processing for moving objects. • Moving object tracking • Specific indexing techniques • R-tree based indexing • B-tree based indexing • Techniques for the efficient handling of frequent updates • Techniques for range and k nearest neighbor query processing, including one-time as well as continuous queries

Graduate Course Portfolio: MDDB • Multidimensional databases (Q4) • Selected techniques for the management of multidimensionally represented data • Multidimensional data and applications • Data warehouses and data mining • Similarity search and query processing • Efficient handling: indexing and associated query processing • Multistep similarity search • Indexing multidimensional data • Skyline query processing • Data mining techniques • Subspace clustering • Classification • Outlier detection

Graduate Course Portfolio: Index • Indexing of disk-based data (Q1) • Indexing techniques for disk-based data for different types of data, as well as their support for queries and updates • General overview over indexes and query processing • Spatial indexing structures • Space partitioning indexing structures • Indexes for high dimensional data • Metric approaches • Special techniques for complex data types • Coming up for the first time this fall

Graduate Course Portfolio: dDB2 • Database management systems (Q2) • The course aims to give the participants a solid conceptual foundation for making competent use of a database management system. • Logical and physical query optimization and query processing • Concurrency control techniques • Database tuning • Central concepts and techniques in relation to supporting temporal and multi-dimensional data • Coming up for the first time this fall

Projects • Streamspin • Enable sites that are for mobile services what YouTube is for video • Easy mobile service creation and sharing • Advanced spatial and social context functionality • Be an open, extensible, and scalable service delivery infrastructure • MOVE • Knowledge extraction from massive data about moving objects • Cross-cutting activities, showcases, and evaluation • Representation of movement data and spatio-temporal databases • Analysis of movement and spatio-temporal data mining • WallViz • Collaborative analysis, joint decision making on wall-sized displays • scale to massive data collections • support ad-hoc queries • automatically provide entry points for analysis http://www.move-cost.info 8

Projects (2) • GEOCROWD • Creating a Geospatial Knowledge World: • advance the state-of-the-art in collecting, storing, analyzing, processing, reconciling, and publishing user-generated geospatial information on the Web • REDUCTION • Reducing the environmental footprint of fleets of vehicles • Optimizing the behavior of drivers • Supporting eco-routing of vehicles • Enabling transparency in multi-modal transportation • eData • Robust analysis in the context of imperfect data in e-Science • Detect and correct anomalies effectively • on-line, interactive, lineage-preserving, and semi-automatic • Scalable algorithms

How We Typically Work • We target some real problem that we find interesting. • We define the problem precisely. • We develop a solution that is typically a data structure or an algorithm, i.e., a concrete technique. • To evaluate, we build prototypes. • These are built for the purpose of studying the properties of our solutions. • We are often interested in performance, e.g., runtime, space usage, communication cost. • For some solutions we state formal properties that we then prove, e.g., the correctness of a particular technique • Brief: isolate and define problem, construct, then evaluate

Example 1: Spatial Web Querying • Setting • Google: ~90 billion queries/month, ~20 billion with local intent. • We want to integrate exact locations of websites (for shops, bars, etc.) and users into web querying. • Queries • Results must match the query text and must be near the user. • Results of continuous queries must be updated as the user moves. • Challenges? • Support such queries with low computation cost on the server and • with little communication between server and client. • Solution • Invent an index that supports both text and location • Use a safe zone to reduce the communication between user and server for continuous queries

Example 2: Fraud detection • There are billions of financial transactions per minute • How do we uncover fraud? • Scalability • In-time for reaction • Manageable results • Possible solution sketch • Identify attributes of suspicious transactions • Sort incoming transactions into a tree-structure of historic data • When processing time is up, output degree of suspicion based on similarity to valid or fraudulent historic data

Interested? • Come talk to us! • We currently have M.Sc. and PhD. thesis openings

Database and Data-Intensive Systems

Database and Data-Intensive Systems

Presentation Transcript

CPS216: Advanced Database Systems (Data-intensive Computing Systems) How MapReduce Works (in Hadoop)

Data Intensive Biomedical Computing Systems

CPS 216: Data-intensive Computing Systems

Resource Management in Data-Intensive Systems

DATABASE SYSTEMS, DATA WAREHOUSES, AND DATA MARTS

CPS216: Data-intensive Computing Systems

DATABASE SYSTEMS, DATA WAREHOUSES, AND DATA MARTS

CS216: Data-Intensive Computing Systems

Data -Intensive Computing Systems

DATABASE SYSTEMS, DATA WAREHOUSES, AND DATA MARTS

EECE 571R: Data-intensive computing systems

CPS 216: Data-intensive Computing Systems

CPS 516 : Data-intensive Computing Systems

Database Systems – Data Warehousing

Data -Intensive Computing Systems Data Access from Disks

CPS216: Data-intensive Computing Systems

CS216: Data-Intensive Computing Systems