1 / 31

Diamond:

The Diamond storage runtime decides whether to evaluate a searchlet ... Diamond is a system that supports interactive data analysis of large complex data set ...

Kelvin_Ajay
Télécharger la présentation

Diamond:

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Diamond: A Storage Architecture for early Discard in Interactive Search Larry Huston, et al. FAST ’04 Jan. 26th, 2006 Speaker: Sehwan Lee

  2. Contents • Introduction • Background and Motivation • Diamond Architecture • Diamond Application • Prototype Implementation • Experimental Evaluation • Related Work • Conclusion

  3. Introduction

  4. Introduction • Goal • To enable interactive search of nonindexed data • Diamond  ‘Early Discard’ technique • Focus • Pure brute-force interactive search

  5. Background and Motivation

  6. Background and Motivation • Limitation of Indexing • Infeasible manual indexing • High-dimensional representation • Sophisticating queries • Complicating user’s need

  7. Background and Motivation • Important of Early Discard

  8. Background and Motivation • Self-Tuning for Hardware Evolution • Flexibility of active disk • Well-suited for ‘early discard’ • Two mechanisms of early discard • Application generates specialized early discard code • Dynamically adapt the evaluation of early discard code • Two aspects of early discard • Adaptive partitioning of computation bet’n toe storage devices and the host computer • Dynamic ordering of search terms to minimize the total computation time

  9. Background and Motivation • Exploiting the Structure of Search • Search tasks • Only require read access • Typically permit stored objects to be examined in any order • Efficient for parallelism • Do not require maintaining state bet’n objects • Efficient for parallelism

  10. Diamond Architecture

  11. Diamond Architecture • Diamond Architecture • Searchlet • Contains all of the domain specific knowledge needed for early discard • Is a proxy of the application that can execute within the back end

  12. Diamond Architecture • Searchlets • Searchlet Structure • A set of filters + some configuration state • Creating Searchlets • A domain application generates searchlets in response to a user’s query in a number of ways • Domain experts implement a library of filter functions • A domain application generates code on the fly

  13. Diamond Architecture • Key Interfaces • Three APIs to isolate components • Searchlet API • Applications use to interact w/ Diamond • Filter API • To interact w/ the storage run-time environment • Associative DMA • Isolates the host and the storage implementations • This abstracts the transport mechanism and flow control bet’n host and storage run-time system

  14. Diamond Architecture • Host and Storage Systems • The host system • Where the domains application executes • The storage system • Provides a generic infrastructure for searchlet execution

  15. Diamond Applications

  16. Diamond Applications • Suitable characteristics for Diamond application • The user is searching for specific instances of data that match a query rather than aggregate statistics about the set of matching data items • The user’s criteria for a successful match is often subjective, potentially ill-defined, and typically influenced by the partial results of the query • The mapping bet’n the user’s needs and the matching objects is too complex for it to be captured by a batch operations

  17. Diamond Applications • SnapFind Description • Goal • To enable users to interactively search through large collection of unlabeled photographs • by quickly specifying searchlets that roughly correspond to semantic content • to create complex image queries by combining simple filters that scan images for patches containing particular color distributions, shapes or visual textures • Infeasible indexing • Different search filter at query time • High-dimensional content

  18. Diamond Applications • SnapFind Usage Experience • Example task • Retrieve photos from an unlabeled collection based on semantic content • 2 cases using same GUI • Purely manual search • Using SnapFind

  19. Prototype Implementation

  20. Prototype Implementation • Dynamic Partitioning of Computation • The Diamond storage runtime decides whether to evaluate a searchlet locally or at the host computer • Two methods for partitioning computation • CPU Splitting • Queue Back-Pressure

  21. Prototype Implementation • Filter Ordering • Average time to process an object through a series of filters F0…Fn • C=c(F0)+P(F0)c(F1)+P(F1|F0)P(F0)c(F2)+P(F2|F1,F0)P(F1|F0)P(F0)c(F3)+…… • Partial Ordering • Partial ordering  linear extension • Ordering Policies • Independent • Hill climbing (HC) • Best filter first (BFF)

  22. Experimental Evaluation

  23. Experimental Evaluation • Description of Searchlets • Test queries

  24. Experimental Evaluation • Description of Searchlets • Filters

  25. Experimental Evaluation • Disk and Host Processing Power

  26. Experimental Evaluation • Disk and Host Processing Power

  27. Experimental Evaluation • Impact of Dynamic Partitioning

  28. Experimental Evaluation • Impact of Filter Ordering

  29. Experimental Evaluation • Using Diamond on Large Datasets

  30. Related Work • On interactive data analysis • On approximate query processing

  31. Conclusion • Diamond is a system that supports interactive data analysis of large complex data set • To efficiently perform brute-force search the diamond architecture uses early discard to push filter processing to the edges of the system • The diamond architecture enables the system to adapt to different hardware configurations by dynamically adjusting where computation is performed

More Related