1 / 29

Deborah McGuinness and Joanne Luciano with Peter Fox and Li Ding CSCI-6962-01

Foundations VI: Discovery, Access and Semantic Integration Data Mining and Knowledge Discovery - Continued. Deborah McGuinness and Joanne Luciano with Peter Fox and Li Ding CSCI-6962-01 Week 13, November 29, 2010. Extra. Knowledge Discovery. 3. Has a broad meaning Finding ontologies

rue
Télécharger la présentation

Deborah McGuinness and Joanne Luciano with Peter Fox and Li Ding CSCI-6962-01

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Foundations VI: Discovery, Access and Semantic IntegrationData Mining and Knowledge Discovery - Continued Deborah McGuinness and Joanne Luciano with Peter Fox and Li Ding CSCI-6962-01 Week 13, November 29, 2010

  2. Extra

  3. Knowledge Discovery 3 • Has a broad meaning • Finding ontologies • Creating new knowledge from • Previous knowledge • New sources (data, information) • Modeling • We’ll look at a mining approach as an example

  4. Mining 4 We will start with data but the ideas apply to information and knowledge bases as well Definition History Our interest

  5. SAM: Smart Assistant for Earth Science Data Mining PI: Rahul Ramachandran Co-I: Peter Fox, Chris Lynnes, Robert Wolf, U.S. Nair

  6. Science Motivation Study the impact of natural iron fertilization process such as dust storm on plankton growth and subsequent DMS production Plankton plays an important role in the carbon cycle Plankton growth is strongly influenced by nutrient availability (Fe/Ph) Dust deposition is important source of Fe over ocean Satellite data is an effective tool for monitoring the effects of dust fertilization Analysis entails Mine MODIS L1B data for dust storm events and identify the swath of area influenced by the passage of the dust storms. Examine correlations between fertilization, plankton growth and DMS production

  7. Current Analysis Process MODIS aerosol products don’t provide speciation Locate and download all the data to their local machine Write code to classify and detect dust accurately [ 3-4 month effort] Write code to classify and detect other dust aerosols [ 3-4 month effort] Write code to segment the detected region in order to account for advection effect and correlation coefficient [2 months effort]

  8. Analysis with SAM Create a workflow to perform classification using many different state of the art classifiers on distributed data Create a workflow to segment detected regions using image processing services on distributed data Bottom line: Scientist does not have to write all the code to perform the analysis Can compose workflows that utilize distributed data/services Can share the workflow with others to collaborate, reuse and modify

  9. Conducting Science using Internet as the Primary Computer

  10. Mash-ups Example: Yahoo Pipes

  11. Data Mining in the ‘new’ Distributed Data/Services Paradigm

  12. Too many choices!! • And that’s only part of the toolkit • ADaM-IVICS toolkit has over 100+ algorithms

  13. SAM Objectives Improve usability of Earth Science data by existing data mining services for research, by incorporating semantics into the workflow composition process. Semantic search capable of mapping a conceptual task Assistance in mining workflow composition Verification that services are connected in a semantically correct fashion

  14. Ontology Use

  15. Semi-automated Workflow Composition Filtering services based on data format

  16. Semi-automated Workflow Composition Filtering service options based on both data format and task selected

  17. Semi-automated Workflow Composition Final Workflow

  18. Science Motivation Study the impact of natural iron fertilization process such as dust storm on plankton growth and subsequent DMS production Plankton plays an important role in the carbon cycle Plankton growth is strongly influenced by nutrient availability (Fe/Ph) Dust deposition is important source of Fe over ocean Satellite data is an effective tool for monitoring the effects of dust fertilization

  19. Hypothesis In remote ocean locations there is a positive correlation between the area averaged atmospheric aerosol loading and oceanic chlorophyll concentration There is a time lag between oceanic dust deposition and the photosynthetic activity

  20. Primary source of ocean nutrients OCEAN UPWELLING WIND BLOWNDUST SAHARA SEDIMENTS FROM RIVER

  21. CLOUDS Factors modulating dust-ocean photosynthetic effect SST CHLOROPHYLL DUST NUTRIENTS SAHARA

  22. Objectives Use satellite data to determine, if atmospheric dust loading and phytoplankton photosynthetic activity are correlated. Determine physical processes responsible for observed relationship

  23. Preliminary Results

  24. Data and Method Data sets obtained from SeaWiFS and MODIS during 2000 – 2006 are employed MODIS derived AOT

  25. The areas of study 8 7 6 1 2 5 3 4 1-Tropical North Atlantic Ocean 2-West coast of Central Africa 3-Patagonia 4-South Atlantic Ocean 5-South Coast of Australia 6-Middle East 7- Coast of China 8-Arctic Ocean *Figure: annual SeaWiFS chlorophyll image for 2001

  26. Tropical North Atlantic Ocean  dust from Sahara Desert -0.17504 -0.0902 -0.328 -0.4595 -0.14019 -0.7253 -0.1095 Chlorophyll AOT -0.68497 -0.15874 -0.85611 -0.4467 -0.75102 -0.66448 -0.72603

  27. Arabian Sea  Dust from Middle East 0.59895 0.66618 0.37991 0.45171 0.52250 0.36517 0.5618 Chlorophyll AOT 0.65211 0.76650 0.69797 0.4412 0.75071 0.708625 0.8495

  28. Summary and future work Dust impacts oceans photosynthetic activity, positive correlations in some areas NEGATIVE correlation in other areas, especially in the Saharan basin Hypothesis for explaining observations of negative correlation: In areas that are not nutrient limited, dust reduces photosynthetic activity But also need to consider the effect of clouds, ocean currents. Also need to isolate the effects of dust. MODIS AOT product includes contribution from dust, DMS, biomass burning etc.

  29. Case for SAM MODIS aerosol products don’t provide speciation Why performing this data analysis is hard? Need to classify and detect Dust accurately Need to classify and detect other aerosols (eg. DMS accurately) Need to segment the detected region in order to account for advection effects and correlation coefficient. What will SAM provide? Provide capability to create a workflow to perform classification Provide capability to create a workflow to segment detected regions Bottom line: Scientist does not have to write all the code to perform the analysis Can compose workflows that utilize distributed data/services Can share the workflow with others to collaborate, reuse and modify

More Related