1 / 49

Quality Management in Multimedia Databases and Data Stream Management Systems

Quality Management in Multimedia Databases and Data Stream Management Systems. Yicheng Tu Department of Computer Sciences Purdue University Advisor: Prof. Sunil Prabhakar. Quality?.

prisca
Télécharger la présentation

Quality Management in Multimedia Databases and Data Stream Management Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Quality Management in Multimedia Databases and Data Stream Management Systems Yicheng Tu Department of Computer Sciences Purdue University Advisor: Prof. Sunil Prabhakar Final Exam, May 25, 2007

  2. Quality? The nature, kind, or character (of something). Hence, the degree or grade of excellence, etc. possessed by a thing.Restricted to cases in which there is comparison (expressed or implied) with other things of the same kind. - Oxford English dictionary character with respect to fineness, or grade of excellence … - Dictionary.com Final Exam, May 25, 2007

  3. Our Definition series of parameters that describe the characteristics of data processing and lead to different degrees of user satisfaction • Overlaps with the concept of Quality-of-Service (QoS) • Not data quality Final Exam, May 25, 2007

  4. Problems • Two types of problems • Determine the quality of concurrent applications for maximal user satisfaction • To maintain quality of applications under highly dynamic environments • Problems are system and application-specific • Various techniques/solutions are involved. • Resource reservation • Application adaptation Final Exam, May 25, 2007

  5. Roadmap • Introduction • Controlling delays in data stream management systems (DSMSs) • Quality-aware (media) data replication • Other works Final Exam, May 25, 2007

  6. Data Stream Management Systems • Data-active query-passive model • Continuous query • Continuous data, discarded after being processed • Applications • Financial analysis • Mobile services • Sensor networks • Network monitoring Final Exam, May 25, 2007

  7. Load Shedding • Data processing in DSMS is quality-critical • Tuple processing delay • Data loss • Sampling rate, window size, … • Overloading during spikes  degraded quality (processing delay) • Solution: load shedding (i.e., adjust data loss) • Eliminating excessive load by dropping data items • Users tolerate approximate query results Final Exam, May 25, 2007

  8. Load Shedding: Challenges • Constantly discarding most packets would work • What happens to query accuracy? • The real (and hard) problem is: How to maintain processing delays while minimizing data loss ? • Specifically • When? • How much? • For how long? • Which ones to discard? Final Exam, May 25, 2007

  9. State-of-the-Art • Data triage (Reiss & Hellerstein, ICDE06) • Put data into an fast-track analyzer upon overloading • LoadStar (Chi et al., VLDB05) • Accuracy of aggregate queries under load shedding (Babcock et al., ICDE04) • QoS-driven load shedding (Tatbul et al., VLDB03, 06) All utilize intuitive rule-of-thumb algorithms to decide when,how much, and how long Does not work under bursty arrival pattern and variable tuple processing cost Final Exam, May 25, 2007

  10. Our Approach • Insight: treat load shedding as a control problem • Control: manipulation of system states (outputs) by adjusting input(s) to system • In our problem • processing delay -> output • amount of load injected -> input • Problem reformulation: Let the output track the desirable value by changing the amount of load discarded delay time Final Exam, May 25, 2007

  11. Feedback Control • Suitable for rejecting the effects of disturbances • Main components form a feedback control loop Reference Value yd Disturbance e(k) = yd - y(k) + S Actuator Controller Plant – Plant: DSMS engine Actuator: load shedder y: average data processing delay yd: desired processing delay e: control error u: allowed load into DSMS Final Exam, May 25, 2007

  12. Issues • System modeling • Critical for control loop design • Analytical models desirable but not currently available • Experimental methods can be used • Controller design • Database-specific challenges • Lack of real-time measurement of output signal y • Actuator may not be able to implement control signal correctly Final Exam, May 25, 2007

  13. Modeling Borealis • Interestingly, system identification of Borealis shows a first-order model with single-queue characteristics • In other words (block diagram) Final Exam, May 25, 2007

  14. Controller Design • Design based on pole placement • Locations of pole(s) determine how fast/well the system responds • Guaranteed performance targets • Convergence rate - responsiveness • Damping - smoothness • The controller: Final Exam, May 25, 2007

  15. DSMS-specific challenges • A database system is different from a traditional control system in many ways • Lack of real-time measurement of output signal y • Actuator may not be able to implement control signal correctly • Solutions are provided in the context of DSMS • Need more systematic study from a control viewpoint Final Exam, May 25, 2007

  16. Experiments • Controller and load shedder implemented in a real DSMS - Borealis • Synthetic (“Pareto”) and real (“Web”) data streams • Query network with variable average processing cost • Experiments for comparison • Aurora - open loop • Baseline - primitive feedback control Final Exam, May 25, 2007

  17. Experiments: Inputs Final Exam, May 25, 2007

  18. Main Results - Synthetic Data Final Exam, May 25, 2007

  19. Main Results - Real Data Final Exam, May 25, 2007

  20. Main Results - Data Loss Final Exam, May 25, 2007

  21. Summary on Load Shedding • Load shedding is an effective quality adaptation method in DSMSs • Ad hoc solutions do not work well under dynamic load • A load shedding approach based on feedback control theory shows promising results in a real-world DSMS • Control theory could provide solutions to other database problems • However, we need to address new challenges that are unique in database problems Final Exam, May 25, 2007

  22. Roadmap • Introduction • Controlling delays in data stream management systems (DSMSs) • Quality-aware (media) data replication • Other works Final Exam, May 25, 2007

  23. Quality-Aware Queries in Multimedia DBMS • Quality = QoS • Querying the DB with quality parameters SELECT vid:[s] FROM VidLib1 WHERE (vid, s) IN FindVideoWithObject( Someone ) QUALITY Resolution = High, Color_depth = Low Final Exam, May 25, 2007

  24. Quality-aware Data Retrieval • Quality (QoS) critical for media data • Varieties of user quality requirements • Determined by user preference and resource availability • Large number of quality combinations • Adaptation techniques to satisfy quality needs • Dynamic adaptation: online transcoding • Static adaptation: retrieve precoded replica from disk Final Exam, May 25, 2007

  25. Dynamic Adaptation • Transcoding is very expensive in terms of CPU cost • Situation may improve in the future • Layered coding • Not standardized yet. • Less popular than people expected Final Exam, May 25, 2007

  26. Static Adaptation • Little CPU cost • Choice of many commercial service providers • What about storage cost? • On the order of total number of quality points • Ignored in previous research assuming • Very few quality profiles • Storage is dirt cheap • Excessively high for service providers Final Exam, May 25, 2007

  27. Quality-Aware Replication • Replicas are of different “quality” • Destination: point(s) in a metric quality space • Costs of transformation among different qualities are very high • Applications • Multimedia • Materialized view • Biological structure • Good news: read-only • Bad news: too much storage needed Final Exam, May 25, 2007

  28. Two Quality Models • Hard-Quality: Users are strict in their quality needs • Quality A cannot serve a request for quality B • Online transcoding is needed • Soft-Quality: Users are willing to negotiate/compromise • Quality A can serve a request for quality B • With some penalties (quantified by utility functions) Final Exam, May 25, 2007

  29. Hard-Quality Systems • Problem is to minimize reject rate (probability) P under an overall storage constraint C, given • fk: query rate to that quality k • uk: service time for quality k • sk: storage consumption for quality k • ck: CPU consumption for quality k • Map system to a multi-rate Erlang loss system • Reduced the problem to a 0-1 Knapsack • A (good) heuristic solution: • Sort all qualities by their fk /sk values and fill in the storageC Final Exam, May 25, 2007

  30. Soft-quality system: the fixed-storage replica selection (FSRS) Problem • An optimization: get the highest utility given the popularity (fk), storage cost (sk) of all quality points under total storage S • u(j,k): the utility when a request on qualityjis served by quality k • Utility is given as a function of distance in quality space • Requests served by the closest replica Final Exam, May 25, 2007

  31. The FSRS Algorithms (I) • Problem is NP-hard: a variation of k-mean • We propose a heuristic algorithm named Greedy • Aggresively selects replicas based on the ratio of marginal utility gain (∆u) to cost (sk) • Time complexity: O(m2I) where I is the # of replicas selected and m the total # of possible replicas • selected replica set P := Φ • available storage s’ := S • while s’ > 0 • add the quality point that yields • the largest ∆u/sk value to P decrease s’ by sk • return P Final Exam, May 25, 2007

  32. The FSRS Algorithms (II) • Greedy could pick some bad replicas, especially the earlier selections • Remedy: remove those bad choices and re-select • The Iterative Greedy algorithm: • Time complexity: same as Greedy with a larger coefficient P ← a solution given by Greedy while there exists solution P’ s.t. U(P’) > U(P) doP ← P’ returnP Final Exam, May 25, 2007

  33. Other Extensions • Our FSRS algorithms can be easily extended to handle • Multiple media objects • Further user-specified constraints on replicas to be selected • Multiple servers Final Exam, May 25, 2007

  34. Dynamic Replication • Popularity f of replicas could change over time • We only consider the situation where popularity of all replicas of a media object changes together • Reasonable assumption in many systems • Competition for storage among media objects • Desirable dynamic replication algorithms: • Find solutions as optimal as those by static FSRS algorithms • Fast enough to make online decisions • Naïve solution: run Greedy every time a change of f occurs Final Exam, May 25, 2007

  35. Replication Roadmap (RR) • Consider the order replicas are selected by Greedy – follow a predefined path (RR) for each media object • RRs are all convex • Exchanges of storage may happen between two media objects, triggered by the increase/decrease of f • The one that becomes more popular takes storage from the least popular one • The one that becomes less popular gives up storage to the most popular one • It is efficient to make exchanges at the frontiers of the RRs, no need to look inside Final Exam, May 25, 2007

  36. Replication Roadmap (continued) • Storage exchanges, example: Media A should take storage from media B as the slope of its current segment in RR is greater than that of B’s Final Exam, May 25, 2007

  37. Dynamic FSRS algorithm • Based on the RR idea • Proved performance: results given are as optimal as those chosen by Greedy • Preprocess phase: • Build the RRs • Online phase: • Performing exchanges till total utility converges • Time complexity: O(I log V) whereI: # of storage exchanges occurs and V is the # of media objects Final Exam, May 25, 2007

  38. Effectiveness of FSRS Algorithms • For comparison: • The optimal solution (by CPLEX) • Random selections • Local popularity-based Final Exam, May 25, 2007

  39. Efficiency of FSRS Algorithms • CPLEX < Iterative Greedy < Greedy < Random < Local • Results on a P4 2.4 GHz CPU: Final Exam, May 25, 2007

  40. Dynamic Replication Results • Randomly generated changes of f • Compare with Greedy • Results with (almost) the same optimality as Greedy • Reason: small number of storage exchanges Final Exam, May 25, 2007

  41. Summary on media replication • Storage cost in static adaptation prohibits replication of all qualities • Optimize toward lowest reject (hard-quality) or the highest utility (soft-quality) given storage constraints • Two heuristics are proposed for static replication that gives near-optimal choices • An online algorithm for a dynamic replication problem Final Exam, May 25, 2007

  42. Other Works • VDBMS - a multimedia DBMS • Quality-of-Service Aware Query Processing [EDBT04] • System architecture [MMSJ03, DMS03, ICDE03] • Peer-to-peer media streaming • Performance analysis [MMCN04, TOMCCAP05] • Genetic algorithms [JEC07] • Other topics in data stream systems • Entity-based query processing [VLDB05] • Stream data compression [GSN06] • Signal processing [JMASM07, CSC05] Final Exam, May 25, 2007

  43. Ongoing and Future Research • Further investigate load shedding problem • Handle actuator uncertainty • Other control targets • Is the optimal achievable? • Quality-aware replication: • General case of dynamic replication, why is a random solution not so bad? • Conjecture: Greedy is 4/3-competitive? • Application of control theory in other database topics • Self-tuning databases Final Exam, May 25, 2007

  44. Publications-1 [TKDE07] Y. Tu, J. Yan, G. Shen and S. Prabhakar. Multi-Quality Data Replication in Multimedia Databases. IEEE Transactions on Knowledge and Data Engineering (TKDE). 19(5):679-694, May 2007. [JMASM07] L. Qu andY. Tu.Change Point Estimation of Bi-Level Functions. Journal of Modern Applied Statistical Methods. 5(2), May 2007 [JEC]H. Fang, Q. Wang, Y. Tu and M.F . Horstemeyer. An Efficient Non-Dominated Sorting Algorithm for Evolutionary Algorithms. Accepted to Journal of Evolutionary Computation. [ICDE07] Y. Tu, S. Liu, S. Prabhakar, B. Yao, and W. Schroeder. Using Control Theory for Load Shedding in Data Stream Management. In Procs. of ICDE, pp.490-491, Istanbul, Turkey, April 2007. [GSN06] Y. Xia, Y. Tu, M. Atallah, and S. Prabhakar. Efficient Data Compression in Location Based Services. In Procs. of 2nd International Conference on Geosensor Networks, Boston, MA, October 2006. [VLDB06] Y. Tu, S. Liu, S. Prabhakar, and B. Yao. Load Shedding in Stream Databases - A Control-Based Approach. In Proceedings of VLDB, pp.787-798, September 2006. [TOMCCAP05] Y. Tu, J. Sun, M. Hefeeda, and S. Prabhakar. An Analytical Study of Peer-to-Peer Media Streaming Systems. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP). 1(4):354-376., November 2005. Final Exam, May 25, 2007

  45. Publications-2 [VLDB05] R. Cheng, B. Kao, S. Prabhakar, A. Kwan, and Y. Tu. Adaptive Stream Filters for Entity-Based Queries with Non-Value Tolerance. In Proceedings of VLDB, pp.37-48, August 2005. [DEXA05a] Y. Tu, J. Yan, and S. Prabhakar. Quality-Aware Replication of Multimedia Data. In Proceedings of DEXA, pp. 240-249, August 2005. [DEXA05b] Y. Tu, M. Hefeeda, Y. Xia, S. Prabhakar, and S. Liu. Control-based Quality Adaptation in Data Stream Management Systems.In Proceedings of DEXA, pp. 746-755, August 2005. [CSC05] L. Qu and Y. Tu. Change Point Estimation of Bar Code Signals.In Proceedings of International Conference on Scientific Computing. pp.109-114, Las Vegas, USA, June 2005. [MMJS04] W. Aref, A. Catlin, A. Elmagarmid, J. Fan, M. Hammad, I. Ilyas, M. Marzouk, S. Prabhakar, Y. Tu and X. Zhu. VDBMS: A Testbed Facility for Research in Video Database Benchmarking. ACM/Springer Multimedia Systems. 9(6):575-585., June 2004. [EDBT04] Y. Tu, S. Prabhakar, A. Elmagarmid and R. Sion. QuaSAQ: An Approach to Enabling End-to-End QoS for Multimedia Databases. In Proceedings of Extending Database Technology (EDBT), pp.694-711, Herakolin, Greece., March 2004. [MMCN04] Y. Tu, J. Sun and S. Prabhakar. Performance Analysis of A Hybrid Media Streaming System. In Proceedings of ACM/SPIE Conf. on Multimedia Computing and Networking (MMCN), pp.69-82, San Jose, CA., January 2004. Final Exam, May 25, 2007

  46. Publications-3 [DMS03] W. Aref, A. Catlin, A. Elmagarmid, J. Fan, M. Hammad, I. Ilyas, M. Marzouk, S. Prabhakar, Y. Tu and X. Zhu (alphabetical order). VDBMS: A Testbed Facility for Research in Video Database Benchmarking. In Proceedings of Intl. Conf. on Distributed Multimedia Systems (DMS) 2003, pp.160-166. [ICDE02] W. Aref, A. Elmagarmid, J. Fan, J. Guo, M. Hammad, I. Ilyas, M. Marzouk, S. Prabhakar, A. Rezgui, A. Teoh, E. Terzi, Y. Tu, A. Vakali, X. Zhu (alphabetical order). A Distributed Database Server for Continuous Media. Procs. of ICDE, pp.490-491. San Jose, CA., March 2002. [ICDE06] Y. Tu and S. Prabhakar.Control-Based Load Shedding in Data Stream Management Systems. PhD Workshop, in conjunction with ICDE 2006. Submitted: Using control theory for self-tuning databases. Submitted to journal. Final Exam, May 25, 2007

  47. Thank you! Questions? Final Exam, May 25, 2007

  48. QuaSAQ • Quality-of-Service-Aware Query processing • Users do not need to know low-level details • Cost evaluation toward global optimization goals • Throughput • Utilizing current system/network QoS support to deliver the query results • Theory first presented in Bertino et al., 2003 • Prototyping is essential Final Exam, May 25, 2007

  49. QuaSAQ Architecture • Our approach: • Augment the query evaluation and optimization modules to directly take QoS into account • Major components • Offline multimedia processor • Transcode media objects into copies with different QoS/formats • Estimate resource use • Online components • QoS Browser • Quality Manager • QoS APIs Final Exam, May 25, 2007

More Related