1 / 26

Retrospecting VM Images

Discover how retrospection technology enables efficient search, debugging, legal compliance, and malware tracking for VM images while minimizing code constraints.

rewers
Télécharger la présentation

Retrospecting VM Images

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Retrospecting VM Images Wolfgang Richter Glenn Ammons*, Jan Harkes, Adam Goode, Vasanth Bala*, Nilton Bila+, Eyal de Lara+, Mahadev Satyanarayan * IBM Research + University of Toronto

  2. The Importance of Content http://www.pdl.cmu.edu/

  3. Retrospection http://www.pdl.cmu.edu/ • VM collections growing • 300% Year over year, IBM Research RC2 • System, application, and user content • Searchable history • Debugging opportunities • Legal data or code origin • Malware tracking • License violations

  4. Roadmap • What is the retrospection problem? • What are the main challenges? • How can we solve them? http://www.pdl.cmu.edu/

  5. The retrospection mechanism should place as few constraints as possible on the code used for search computations. Principle 1 http://www.pdl.cmu.edu/

  6. A search computation should only be performed on demand for a specific query, and its scope should be restricted to the smallest relevant subset of VM images. Principle 2 http://www.pdl.cmu.edu/

  7. Control of policy for retrospection should reside with the owners of VM images and their delegates. Principle 3 - wip http://www.pdl.cmu.edu/

  8. Find The Picture OK OK ? http://www.pdl.cmu.edu/ Rich content-based and application-specific queries 10 Graphics 100 Graphics 1000+ Graphics

  9. OpenDiamond Platform Principle 1 The retrospection mechanism should place as few constraints as possible on the code used for search computations. http://www.pdl.cmu.edu/ • Distributed, interactive, unindexed search • Focuses on the principle of early discard • Enables arbitrary search queries • Arbitrary x86 binary code

  10. Available Structured Data http://www.pdl.cmu.edu/ • VM’s have attributes and metadata • Owners • Files • File Systems • Files have attributes and metadata • Owners • File Type • Permissions • Modification Timestamp

  11. Scoping Solution Principle 2 A search computation should only be performed on demand for a specific query, and its scope should be restricted to the smallest relevant subset of VM images. http://www.pdl.cmu.edu/ • Metadata MySQL database • Scope Server • Manage access to data • Scope Cookie • X.509 signed cookie • Determines accessible data

  12. Problem: VM Sprawl in Files Data from 78 NCSU VCL VM Images based on Windows XP http://www.pdl.cmu.edu/

  13. Problem: VM Sprawl in Bytes Data from 78 NCSU VCL VM Images based on Windows XP http://www.pdl.cmu.edu/

  14. Solution: Deduplication in Files Reduce Search Time Data from 78 NCSU VCL VM Images based on Windows XP http://www.pdl.cmu.edu/

  15. Solution: Deduplication in Bytes Reduce Storage Space Data from 78 NCSU VCL VM Images based on Windows XP http://www.pdl.cmu.edu/

  16. IBM Research Mirage http://www.pdl.cmu.edu/ • File-level deduplication • Files are referenced by SHA-1 tag • Reads VM image partitions and file systems • On-disk deduplicated format • Centralized VM store – a potential bottleneck

  17. Network Bottlenecks Megabytes Takeaway: Centralized store limited by network bandwidth, limiting parallelism. http://www.pdl.cmu.edu/

  18. Network Bottlenecks Objects Takeaway: The number of objects pushed determines the possible number of search processes. http://www.pdl.cmu.edu/

  19. Dataretriever http://www.pdl.cmu.edu/ • Abstract data source • getObject() interface • Search process oblivious to where data comes from • Access deduplicated data • Unmodified client and search server • Solve network bottleneck: Data partitioning • Compute on local data rather than central store • Layer of indirection enables this without modification

  20. Architecture Scope Cookie Scope Definition Scope Mirage Request Objects Server Client Raw Data Dataretriever Dataretriever Dataretriever Server Metadata Query Query+Cookie Server MySQL http://www.pdl.cmu.edu/

  21. Revisiting Network Bottlenecks How bad is the bottleneck with content-based queries? http://www.pdl.cmu.edu/

  22. CPU-Bound Search Process Takeaway: Content search is limited by computation, although embarrassingly parallel. http://www.pdl.cmu.edu/

  23. Achievable Efficient Retrospection Takeaway: Search scales with servers, and the Mirage case closely matches local. http://www.pdl.cmu.edu/

  24. Current Research: Principle 2 http://www.pdl.cmu.edu/ • Control of policy to owners via encryption • Proof of concept: convergent encrypt /home • Encrypt files using file hash as key • Fine-grained, per file • Future direction: key escrow? • Support investigations and warrants • Support multiple encryption methods? • Per VM Image? Groups?

  25. Recap http://www.pdl.cmu.edu/ • Retrospection – search VM image content • Main challenges • Get data efficiently • Solution: Dataretriever • Handle big and growing data • Solution: Scoping + Deduplication • Privacy and encryption

  26. OpenDiamond - http://diamond.cs.cmu.edu IBM Mirage - http://doi.acm.org/10.1145/1346256.1346272 Convergent Encryption - http://doi.acm.org/10.1145/339331.339345 LEARN MORE http://www.pdl.cmu.edu/

More Related