1 / 8

Predictive Modeling in Data Management

Predictive Modeling in Data Management. Byung S. Lee Computer Science University of Vermont http://www.emba.uvm.edu/~bslee/homepage/. Cost UDF Overview. Funding: US Department of Energy. Title: Generating Cost Functions of User-Defined Functions. Phase 1: preliminary studies.

akira
Télécharger la présentation

Predictive Modeling in Data Management

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Predictive Modeling in Data Management Byung S. Lee Computer Science University of Vermont http://www.emba.uvm.edu/~bslee/homepage/

  2. Cost UDF Overview • Funding: US Department of Energy. • Title: Generating Cost Functions of User-Defined Functions. • Phase 1: preliminary studies. • Phase 2: core modeling techniques. • Phase 3: applications.

  3. How long would this one take to run? UDF CostUDF Problem

  4. Phase 1 • Approaches: • Off-line training with cost data sets generated in the same batch. • On-line training with cost data sets generated in incremental batches. (a.k.a. self-tuning) • Models: • parametric or nonparametric regression.

  5. Phase 1 • UDFs: • Financial time series aggregate functions: • median(time series, start date, end date) • nth moving window average(time series, start date, end date, window size) • Keyword-based text search functions: • “dog AND cat” • “dog OR cat” • “dog cat” within w words apart. • Spatial search operators: • range(ref_point, distance) • Window(lower_left_point, upper_right_point) • KNN(ref_point, K)

  6. Phase 2 • Approaches: • On-line training with cost data points generated one at a time. • Assume limited main memory. • Models: • Nonparametric techniques using multidimensional index structures.

  7. Phase 2 • Core modeling techniques: • Incremental edited k nearest neighbors. • Memory limited quadtrees. • Dr. Zhen He will give a quick overview of the recent phase 2 efforts.

  8. Phase 3 • Additional core modeling techniques. • Abstraction of the problem to “efficient adaptive predictive modeling of incremental data.” • Applications that need • Value predictions. • Class predictions.

More Related