1 / 25

Distributed Tera-Mining

Distributed Tera-Mining. R. L. Grossman Laboratory for Advanced Computing University of Illinois & Magnify, Inc. Trend 1. Explosion of Data …. … All in the Wrong Format. With no one to analyze it. The Data Gap. Most data comes a GB and a TB at a time. The Data Gap.

jolene
Télécharger la présentation

Distributed Tera-Mining

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Distributed Tera-Mining R. L. Grossman Laboratory for Advanced Computing University of Illinois & Magnify, Inc.

  2. Trend 1. Explosion of Data …

  3. … All in the Wrong Format With no one to analyze it.

  4. The Data Gap Most data comes a GB and a TB at a time. The Data Gap Total new disk (TB) since 1995 New Ph.D.s

  5. Trend 2. Sonet is dead. Lambda Rules. Gigabytes can be moved in seconds.

  6. Trend 3: Most Data is Distributed • Bush’s Law: The usefulness of a column of data varies as the square of the number of columns it is compared to.

  7. Example 1: ENSO & Cholera El Nino Data at NCAR Cholera Data at WHO

  8. Table 2 Table 1 Example 2: Voting

  9. Correlation: Reform Voters vs Votes for Buchanan Palm Beach

  10. DataSpace – One Approach to Making Data Useful Complementary to the grid, which we view as a distributed computer. • html • http • search by keyword • workstations servers • pmml & dtml • dstp • correlate & mine • data & compute clusters • 16 terabytes of documents • 4 billion documents Today’sMulti-media Web Tomorrow’sData Web • petabytes of data • tens of billions to trillions of records

  11. DSTP Server 2 DSTP Server 1 k[i], y[j] k[i], x[i] Click to obtain graph UCK [uckid] attributes [aid]

  12. Terra Mining Testbed Optical testbed for distributed tera miningof scientific data. Goal also to be testbed forbroadband based business services.

  13. Lessons Learned • It’s the data stupid. Cycles, cylinders & lambdas are all commodities. • The fundamental challenge: lower the cost to make data useful. • The emergence of internet infrastructure for data is inevitable. Opens up possibilities for new types of scientific discoveries.

  14. For More Information • DataSpace http://www.dataspaceweb.net http://www.ncdm.uic.edu • DataSpace Standards http://www.dmg.org • Selected articles http://www.twocultures.net • Magnify • http://www.magnify.com

  15. End of Slides

  16. FTP Still Lives

  17. OC-3 OC-12 OC-48 Trend 2. Bandwidth is a Commodity

  18. El Nina Anomalies

  19. Indonesia Cholera Cases

  20. Cholera Cases

  21. Distributed Exabytes (New Disks) Petabytes 1 Exabyte Source: IDC (1999) "1999 Winchester Disk Drive Market Forecast and Review"

  22. Trend 3: Most Data is Distributed • W’s Law: The usefulness of a column of data varies as the square of the number of columns it is compared to.

  23. Example 2: Voting

  24. Database 1: Total Votes for Buchanan by County

  25. Database 2: Total Registered Reform Voters by County

More Related