1 / 16

Grid-based Data Stream Processing in e-Science

Grid-based Data Stream Processing in e-Science. Richard Kuntschke 1 , Tobias Scholl 1 , Sebastian Huber 1 , Alfons Kemper 1 , Angelika Reiser 1 , Hans-Martin Adorf 2 , Gerard Lemson 3 , and Wolfgang Voges 3. 2 Max-Planck-Institut 2 für Astrophysik. 3 Max-Planck-Institut

marnie
Télécharger la présentation

Grid-based Data Stream Processing in e-Science

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Grid-basedData Stream Processingin e-Science Richard Kuntschke1, Tobias Scholl1, Sebastian Huber1, Alfons Kemper1, Angelika Reiser1, Hans-Martin Adorf2, Gerard Lemson3, and Wolfgang Voges3 2Max-Planck-Institut 2für Astrophysik 3Max-Planck-Institut 3für extraterrestrische 3Physik 1Lehrstuhl Informatik III: 1Datenbanksysteme 1Fakultät für Informatik 1Technische Universität München Grid-based Data Stream Processing in e-Science

  2. Important Challenges in e-Science • In general: • Large and exponentially growing amounts of data • Distributed data archives • No unique identifiers • Uncertainty • In astrophysics: Spectral Energy Distributions (SEDs) • Used to classify celestial objects (active galactic nuclei, brown dwarfs, neutron stars, ...) • Generation requires spatial (astrometric) matching Grid-based Data Stream Processing in e-Science

  3. Spatial (Astrometric) Matching Current solutions … • … load all data into main memory • Uses a lot of memory • Infeasible if memory size is insufficient • … process all data at once and deliver the complete result at the end • Inefficient • No results until all processing has completed Grid-based Data Stream Processing in e-Science

  4. StarGlobe Grid-based P2P Data Stream Management System implemented on top of Globus In-network processing Early filtering Parallelization Pipelining Load-balancing Mobile user-defined operators Astrophysical Example Workflow Astrometric matching Performance evaluation Our Contributions Grid-based Data Stream Processing in e-Science

  5. Publish Stream 1 Fct-Provider Subscribe transform filter Query 2 The StarGlobe Architecture filter Load mobile operators transform Grid-based Data Stream Processing in e-Science

  6. Data-Prov. C Data-Prov. D Data-Prov. B Data-Prov. A Traditional Approach: Bring Data to Code NN_10 union Grid-based Data Stream Processing in e-Science

  7. Data-Prov. B Data-Prov. C Data-Prov. D Data-Prov. A New Approach: Bring Code to Data Fct-Provider NN_10 NN_10 union NN_10 NN_10 NN_10 NN_10 scan scan scan scan Grid-based Data Stream Processing in e-Science

  8. Mobile User-Defined Operators • Load user-defined operators from function provider servers in the network • Common interface for integrating external operators • Push-based iterator • Flexibility Grid-based Data Stream Processing in e-Science

  9. StreamIterator Interface • open(Config, StreamWriter) • Configuration parameters • Writer for result stream • next(StreamIteratorEvent) • Next element in input stream • Writing output to result stream using StreamWriter.write() • close() Grid-based Data Stream Processing in e-Science

  10. Communication betweenStreamProcessor and StreamIterator Grid-based Data Stream Processing in e-Science

  11. Astrophysical Example Workflow Grid-based Data Stream Processing in e-Science

  12. Distributed Query Evaluation Plan Grid-based Data Stream Processing in e-Science

  13. Distributed Query Evaluation Plan Grid-based Data Stream Processing in e-Science

  14. Distributed Query Evaluation Plan Grid-based Data Stream Processing in e-Science

  15. Evaluation of Early Filtering Grid-based Data Stream Processing in e-Science

  16. Conclusion • Synergies between research in computer science and other scientific disciplines, e.g., astrophysics • StarGlobe • Handling large data volumes efficiently • Early filtering, parallelization, pipelining • Returning first results early on • Pipelining • Flexible support of domain-specific application logic • Mobile user-defined operators • Results also applicable to other domains Grid-based Data Stream Processing in e-Science

More Related