340 likes | 508 Vues
Mirage -- Interactive Pattern Discovery with Large Imaging Databases. Tin Kam Ho Department of Scientific Computing Research Computing Sciences Research Center Bell Labs, Lucent Technologies In collaborations with David Wittman, J. Anthony Tyson of UC Davis
E N D
Mirage -- Interactive Pattern Discovery with Large Imaging Databases Tin Kam Ho Department of Scientific Computing Research Computing Sciences Research Center Bell Labs, Lucent Technologies In collaborations with David Wittman, J. Anthony Tyson of UC Davis Samuel Carliles, William O’Mullane, Alex Szalay of JHU http://www.cs.bell-labs.com/who/tkh/mirage
Mining Large Imaging Databases Basic needs: • Hierarchical data structures & indexing • Sophisticated navigation tools Also, • Joint usage of data, meta-data, extracted features & catalogs • Automatic pattern discovery algorithms to compute layers of abstraction from observation, concepts, to theory Mirage uses visualization to help • Track horizontal correlations across different types of attributes for the same objects • Track vertical correlations across layers of abstraction from signal to the result of analysis • Integrate human and machine pattern recognition capabilities
Horizontal Correlations: Similarity of Objects from Different Perspectives • Objects can be described by many types of attributes: position, morphology, color, spectra, temporal variability, motion … • Meaningful similarity metric exists only for attributes of the same type • Similar groups found from one perspective need to be correlated to those from others e.g. Are the objects similar in color also similar in shape? Shape groups Color groups
Vertical Correlations Across Layers of Analysis Processed Images Raw Images Numerical Features Classes and Groups Validation in Input Domain Relationship between Groups Interpretation in Context
Human / Machine Interaction in Pattern Discovery Domain expertise Hypotheses Decisions in algorithmic choices Interpretation in context Visualized data geometry Systematic exploration control Computed features & data structures Tentative classifications
Miragein action … A simple way to start : java –jar Mirage0.3.jar
Challenges for the Analysis Tool • A good tool should support • separate treatment of non-comparable groups of variables • versatile visualization utilities allowing many perspectives • exploration across data types & levels of abstraction • feedbackbetween manual& automatic pattern recognition methods • A good tool should also • leverage existing visualization, analysis methods • enable continuous growth: new visualization, analysis tools • support seamless interface with data archives • be scalable in data volume and processing speed
Towards Extensibility Mirage Core External Rendering Code VO Data Archives Custom Data Views Data Access Clients Cone Search, CAS FITS Viewer, … Message Based Updates Extinction Calculator Data Analysis Methods Data Exchange Pipes Other Analysis Platforms Web Services
Data Access, Custom Views: VO Enabled Mirage(with Samuel Carliles, William O’Mullane, and Alex Szalay)http://skyservice.pha.jhu.edu/develop/vo/mirage/
Object selection Mirage Core Extracts RA,DEC,[mag]from Mirage data set Positions, mags SOAP client callsExtinction server Positions, mags, filterIDs Enhanced data set Result stream Extinction Service Merges resultswith Mirage data set E(b-v), dered_mags Data Analysis Functions: Extinction Web Service(with Chris Miller, Simon Krughoff)Using DIRBE/IRAS Dust Maps by Schlegel et al.
Continuous Data Updates: SEQUIN experiment(With Marina Thottan, Ken Swanson) Network Poller Obtains statistics from each node Monitored Network Health Checker Computes health indicators Mirage Monitor Retrieves data, updates displays when message arrives Record Keeper Stores statistics and indicators in relational database Messenger Broadcasts messages about database updates
Open Questions • What questions do scientists want to ask about their data? • How can they be translated into graphical operations and answers? • Which automatic algorithms are reliable for the tasks? • Which visualization techniques can help where it matters? • How can we handle large data volume, variable demands on speed, disperse archives, and bandwidth constraints? • What are the best ways to support continuous and collaborative explorations? • …
Mirage can be downloaded at Publicly released on the web since late 2002 Development ongoing … Open source soon to be available http://www.cs.bell-labs.com/who/tkh/mirage