1 / 27

Titan / NGC Impact

Titan / NGC Impact. August 12, 2010 Timothy M. Shead NGC Integration Lead.

biana
Télécharger la présentation

Titan / NGC Impact

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Titan / NGC Impact August 12, 2010 Timothy M. Shead NGC Integration Lead Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company,for the United States Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000. This document is SAND Number: 2010-5300 P

  2. What is Titan? A flexible parallel pipeline architecture, written in C++ Query Display Ingest Extract Model Partition Display • Provides a direct mapping between whiteboard and code. • Manages execution: • Manages the “flow” of data through components. • In what order are components executed? • What needs to be recomputed when parameters change? • Serial or parallel

  3. How the NGC delivers “HPC informaticscapabilities that are both usable and useful to analysts." What is Titan? Web Browser Web Server HPC Resource Web Browser Custom Client Model generation on HPC compute nodes, analysis on a web server, visualization in client.

  4. What is Titan?

  5. What is Titan? A set of libraries you can use today … Home Page: http://titan.sandia.gov Public Git Repository: git://public.kitware.com/titan.git Reference Documentation: http://public.kitware.com/titan-docs Text Analysis Qt Clustering Boost Protovis Titan Trilinos Statistics Map Reduce VTK MTGL Machine Learning

  6. Sidebar: Using Titanwithin Sensitive Domains Text Analysis Qt Clustering Sensitive Application (e.g. La Luz) Boost Protovis Titan Depends-on Trilinos Statistics Sensitive Technique A Map Reduce VTK Sensitive Technique B Machine Learning MTGL

  7. Projects Using Titan • Network Grand Challenge (Hendrickson / Kegelmeyer) • La Luz - SOCOM IATA (Benz / Johnson) • HPC PubMed (Wilson / Shepherd / Otahal / 5900) • MegaTux Network Emulitics (Minnich) • NA-22 Satellite Image Analysis for Non-Proliferation (Watson) • Streaming Data for Cybersecurity (Plimpton) • ParaSpace Simulation Ensemble Characterization (Crossno / Martin / Hunt / Shead) • NIH SBIR: "Flexible information visualization and analysis platform for biomedical research” (Kitware) • Army SBIR: "An open framework for analyzing mobile ad hoc networks" (Kitware) • DARPA SBIR: "An open, adaptive geospatial visual information system" (Kitware) • Navy STTR: "Using stylistic topic models to detect deception through unusual linguistic activity”(Kitware) • DOE SBIR: "A scalable visual system for proliferation analysis" (Kitware) • ParaText Scalable Text Ingestion (Dunlavy / Crossno / Shead / Stanton) • CARGIO: Hard drive Analysis (Kegelmeyer / Desapio) • Nuclear Materials Attribution (Crossno) • OVIS Cluster Monitoring (Brandt) • ThreatView (Summers / Linebarger / Kerr / Wolfenbarger) • Graph-Based Informatics for Counter-Terrorism LDRD (Wylie)

  8. NGC Prototype 2 Recap

  9. P2 Pipeline Local Filesystem NGC P2 Serial All The Way: put the entire pipeline in one process. Zero administration, easy to use, online computation, small data.

  10. LaLuz – SOCOM IATA • Using an “evolved P2” to address undisclosed project domain. • An example of building sensitive applications on Titan.

  11. A modest P2 endorsement … “P2 has been used as a test case on real data which led to NEW leads and analysis of Social Networking activity … further answers were derived when new development capabilities were coordinated with 5900 for P2” - Bryan Ingram, SNL

  12. Network Grand Challenge - P3

  13. Network Grand Challenge - P3 Lovely Video Here

  14. P3 Pipeline Prediction Model Process HPC Resource Web Browser PostgreSQL Database Berkeley DB Database Web Server (Unified Thread Technology) Web Browser Suggested Terms Process HPC Resource Model generation / analysis on HPC compute nodes, visualization in client. Modest administration, modest data, mostly offline computation.

  15. P3 Pipeline Detail: Suggested Terms Input: blog contents Input: feature dictionary Input: language dictionary Feature matrix Input: multi lingual matrix Multi lingual projection Large matrix-matrix product 300x1.2M dense X 1.2Mx18k sparse (Required 16Gb to compute in serial) Input: concept weights Weighted multi lingual projection Valence table Clustered valence table Output: suggested terms

  16. HPC PubMed Pipeline • Using Netezza, LDA, and other text analysis techniques. • Benefitting from NGC multi-tier experience. Custom Client Netezza Database Filesystem? CouchDB? HPC Resource (Red Sky / Red Storm) Web Server Web Browser (Maybe) Model generation on HPC compute nodes, analysis on the web server, visualization in client.

  17. HPC PubMed Visualizations

  18. Megatux Network Emulitics Utilizing the latest in lightweight virtualization techniques, investigate methods to configure and collect data from emulationsemploying millions of VMs on HPC clusters. Millions of virtual machines HPC Resource (Red Storm) Custom socket-based protocol Titan Client Live, streaming data produced on HPC compute nodes, analysis and visualization in client.

  19. Megatux Network Emulitics Prototype Megatux client using Protovis

  20. NA-22 Satellite Image Analysis A graph-based approach to image feature analysis. DOE Office of Nonproliferation (NA-22)

  21. NA-22 Satellite Image Analysis • Graph communities identified using wCNM, converted to KML polygons • Size proportional to feature size, color map based on community ID DOE Office of Nonproliferation (NA-22)

  22. Streaming Data for Cybersecurity Developing tools and techniques for realtime packet analysis. • Titan streaming pipeline executive. • Task-based multithreading. • 15k packets / sec real-time network data. Google Protobuf Network packet-sniffer Titan Client Live, streaming data collected by packet-sniffer hardware, analysis and visualization in client.

  23. Streaming Data for Cybersecurity Prototype streaming Titan client Rendered and Protovis views.

  24. ParaSpace

  25. ParaSpace

  26. ParaSpace Pipeline CouchDB? Hadoop? ParaSpace Analysis HPC Resource Informatics Visualization: Helio view, charts, plots Web Server Simulation Output Files Web Browser ParaView Server HPC Resource ParaView Web Server Scientific Visualization: Interactive 3D volume rendering A combination of rendered and informatics views, delivered to the browser. Modest administration, huge data, mostly offline computation.

  27. Titan / NGC Impact Summary • Titan repository makes a substantial set of NGC techniques immediately available for researchers and application developers. • Multiple Sandia IC customers have committed resources to develop P2. • The depth and breadth of Titan capabilities are being used by multiple internal and external research projects.

More Related