1 / 12

What We Learned so Far

What We Learned so Far. Each participant mentions two or three things they learned so far from the discussions at the meeting June 14, 2011. Yolanda Gil Not just workflows: metadata and provenance are important Long tail is also about scientists sharing their data

rhys
Télécharger la présentation

What We Learned so Far

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. What We Learned so Far Each participant mentions two or three things they learned so far from the discussions at the meeting June 14, 2011

  2. Yolanda Gil • Not just workflows: metadata and provenance are important • Long tail is also about scientists sharing their data • LOTS of data are collected through personal measurements • Tom Harmon • Workflows exist in ocean/lake communities • GLEON: common understanding of process, clear how to contribute, debug the process, guide development • Burt: explicit end point that you want to reach from the data, takes them to the end points more efficiently • Possibilities for automation • Remote sensing data is closer to in-situ sensing community

  3. Pedro Szekely • Finding data is difficult • Data repositories are mostly designed for people (ie to click around), not for machines (download individual csv files 1000 times) • Paul Hanson • Differences in communities: ocean more organized and more resources, ecology is more one-person shops – may be hard to have the same way to do business, ecological orgs don’t have the same kinds of resources so may need different workflow tools. Differences with large resource data providers (eg NASA) and the individuals with data – may need different tools • Individuals harvesting info vs individuals contributing data/info

  4. Sandra Villamizar • My problems are everyone’s problems – how to create a community to learn from others • Importance of metadata • QinghuaGuo • Importance of metadata standard efforts, organizing communities, tools that facilitate standardizing data – minimum requirements vs ideal comprehensive requirements • Documenting process in metadata (provenance), document uncertainty • There are important grand questions – e.g., algae bloom, carbon cycle – that go across communities • Can we envision workflows that can help answer these questions • Define end point, clarify what data needed, describe uncertainty

  5. Mike McCann • Methods are just as important as the data • Data are hard to find and use (not a surprise), but methods are also hard to find and use • Even if the data is described, you tend to call the data providers to understand how to use it (metadata is incomplete, catalogs are complex to use – barrier of entry, tools require skills) • How to blend matlab, excel, R with workflow systems that represent the overall process • “If there was workflow tracking in R, I’d learn R”

  6. Ryan Utz • Consistent resonance of issues across communities, not just my problems • General solutions/standards may be hard, maybe only practical when communities do want to work together • Computer scientists are not impressed, they do have things to offer when scientists bring up issues • But many scientist do not even see the problems • Need better ways to set up collaborations, funding, resources

  7. Andreas Hofmann • How big the problem of finding and cleaning data is • Difficulties of individual scientists to share data with community, how to do that, what are the incentives – could tools help provide those incentives • Models as important as data, need to be annotated, be in a repository – a very important notion because 1) can be used in workflows, 2) can be reused by others • Open source is not a given, benefits are not always well recognized • Maybe because interfaces are not as polished (easy, nice interfaces)

  8. Stephanie Granger • Challenges are not necessarily new (standards for metadata, facilitate data sharing, bridging science and decision making, etc) • Many standards out there, but many are discipline specific – this is another challenge: are these the basis for the more common standards • Flexibility of getting data based on the questions being asked • Plug and play modules that produce the kind of data you want • Data discovery is an issue (not new), surprise that data cleaning and QC takes so much effort • Tim Stough • Excel may be a pain but a lowest common denominator, but interfaces to directories/data plus metadata to import/export data • Collaboration with computer scientists should focus on higher level problems (eg workflows), but a lot of scaffolding needs to be in place • Hypothesize then look for data, versus look at data and come up with hypotheses – any metaphor leads to a bias, this bias should be disclosed so others can understand your data

  9. Burt Jones • Getting CS people involved is right, but “take them to sea”. Mindful of overhead of new tools • Metadata – painful but may be solvable if we get right people together (e.g., photo metadata that goes with the data). Define process stream that handles that, there are a lot of common tools in oceanography that could be mindful of metadata (DMAC group, funding always an issue) • But all systems must fit in a platform • Metadata with files. Eg include in HDF5 files

  10. Amy Braverman • Issues similar to climate and atmospheric data, only here more in-situ sensor data • More hand collected data, so more cleaning is needed • Not so much focus on massive data • Disconnect between CS and scientists – needs to be addressed by workshops like this • Important to find what needs a new solution (CS research), and to identify application of existing CS tools – mind the reward structure • Collaborations take time to ramp up, PhD students spend more time than otherwise in their dissertations, need better reward structures and funding mechanisms

  11. Craig Knoblock • Water is a big community, very diverse discipline, across departments • Laying web map servers (OGC standard) with Google earth and visualize info • Difficulties in finding data; the fact that data are very seldom reused • Matt Becker • Worry about the data that is collected but will never see the light, never be shared (vs MBARI or JPL, they will be ok) – a lot of data is collected this way and it is precious. Noone seems to have worried about that: ecosystem data needs to be preserved • People who have limited resources: hire a CS student and chain them to their desk • Tools that can help clean and publish data easily are very important

  12. EwaDeelman • 2 year publication cycle in environmental science seems long, but actually SCEC collaboration is similar: scientists need time to look at data and it is about 2 years between big runs • Slows down the CS momentum • Interest of the community to speed up analysis of data

More Related