Download
scec ontology development n.
Skip this Video
Loading SlideShow in 5 Seconds..
SCEC Ontology Development PowerPoint Presentation
Download Presentation
SCEC Ontology Development

SCEC Ontology Development

77 Vues Download Presentation
Télécharger la présentation

SCEC Ontology Development

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. SCEC Ontology Development Tom Russ Hans Chalupsky, Stefan Decker, Yolanda Gil, Jihie Kim, Varun Ratnakar University of Southern California Information Sciences Institute

  2. Outline • Background • SCEC Goals • Ontology Basics • Semantic Interoperability • Examples • Weather • Seismology • Building Computational Pathways • Ontology Development • SCEC Ontology Development • Gene Ontology Development • Fundamental Ontologies? • Big Questions

  3. Goals: SCEC/IT Project

  4. What is an Ontology? • An Ontology is a framework for representing shared conceptualizations of knowledge • An Ontology provides: • Definitions for objects and relations in the domain • Shared vocabulary and and common structure for modeling domain knowledge • Domain model/theory that captures common knowledge about the domain

  5. Semantic Interoperability Story • SCEC Java code for Community Velocity Model • Inputs: longitude and latitude • Output: Vs30 (m/s) • Connection technology: Java serialization • In other words: Ship the bits for two double precision floating point values through a network connection • Make sure you send longitude first! • Non-standard convention for geography • Probably based on X-Y convention instead • Better: More structured input • Latitude=34.15 Longitude=-117.58 • Explicit identification of parameters

  6. Ontologizing a Domainsuch as “Weather”

  7. Conditions for Joint Tasks (from: CJCSM 3500.04A 9/13/96, p. 3-11.) Identify Relevant Domain Concepts

  8. Weather Specificationin English (from: CJCSM 3500.04A 9/13/96, p. 3-11.) • C 1.3.1.3 Weather • Definition: current weather (next 24 hours). • Descriptors: clear, partly cloudy, overcast, precipitating, stormy • C 1.3.1.3.1 Air Temperature • Definition: atmospheric temperature at ground level • Descriptors: Hot (> 85° F) Temperate (40° to 85° F) Cold (10° to 39° F) Very Cold (< 10° F)

  9. Formalizing Domain Concepts A knowledge-based system about “Weather” must know things like these: • Terms • hot, humid, windy ... • Definitions • cold = (10° to 39° F) • Relationships • cold and windy may overlap • cold and hot are disjoint • cold and very cold are disjoint! • Rules • IF heavy rain lasts 2 days • THEN muddy terrain and excessive runoff • (probability .9)

  10. Earthquake Hazard Analog • NEHRP Soil Types

  11. Hypocenter vs. Epicenter • The epicenter is the point on the surface directly above the hypocenter. • “Directly above”, more formally: • The latitude and longitude of the epicenter and hypocenter are the same. • The epicenter depth is zero. PowerLoom: (deffunction source-hypocenter ((?s earthquake-source)) :-> (?h location) :documentation "The 3D point where the ruptured started.") (deffunction source-epicenter ((?s earthquake-source)) :-> (?e location) :documentation "The point on the earth's surface directly above the hypocenter" :axioms (=> (earthquake-source ?s) (and (= (latitude-of (source-hypocenter ?s)) (latitude-of (source-epicenter ?s))) (= (longitude-of (source-hypocenter ?s)) (longitude-of (source-epicenter ?s))) (= (depth-of (source-epicenter ?s)) (units 0 "m"))))

  12. PowerLoom • Knowledge representation & reasoning system • Uses definitions specified in a formal logic • First order predicate calculus • Expressive: We can say what we need to • Inference via logical deductions • Support for units and dimensions • Browsing tool: Ontosaurus

  13. Ontosaurus Navigation Tools and Control Panel Display of formal information and rules Diagrams and images aid domain familiarization Domain facts. Textual documentation

  14. Graphical View: Fault Hierarchy

  15. Plan:Building Computational Pathways • Simple scenario to illustrate how a user would define computational pathways • Behind the scenes, DOCKER uses descriptions of components, their I/O requirements and their constraints to: • detect errors in user’s input • suggest additional steps needed to make the pathway work • make educated guesses about how components selected by the user may be connected to one another

  16. Compute PGA for an Address Using These Components Fault-type EarthquakeForecastModel (USGS-02) AttenuationRelationship (Field-2000) Magnitude PGA Fault-type Distance Magnitude Vs30 Time Span Lat/long Fault-type AttenuationRelationship (Campbell-02) Magnitude PGA Distance Address Lat/long Site Type Geocoder Lat/long CommunityVelocity Model Vs30 Lat/long1 DistanceComputation Distance Lat/long2

  17. Some Data Paths Connect Easily Fault-type Fault-type EarthquakeForecastModel (USGS-02) AttenuationRelationship (Field-2000) Magnitude Magnitude PGA Distance Vs30 Time Span Lat/long Address Lat/long Geocoder Lat/long CommunityVelocity Model Vs30 Lat/long1 DistanceComputation Distance Lat/long2

  18. Others Require Transformation Fault-type Fault-type EarthquakeForecastModel (USGS-02) AttenuationRelationship (Field-2000) Magnitude Magnitude PGA Distance Vs30 Time Span Lat/long Lat/long1 DistanceComputation Distance Lat/long2 Address Lat/long CommunityVelocity Model Vs30 Lat/long Geocoder

  19. Developing Ontologies

  20. SCEC Ontology Development • Task-driven • Particular application • Modeled on domain inferences & reasoning • Small team of Computer Scientists • Seismology - Tom Russ • Models - Jihie Kim, Varun Ratnakar, Tom Russ • Small group of Domain Experts • Ned Field and Tom Jordan • Future • Development and curation by domain experts • Requires methodology • Requires tools

  21. Computation and checking of properties Definitions of Terms Capture Inference in Ontology Ned Field’s markup of fault parameter data

  22. The Gene Ontology (GO) • Had a successful jumpstart • Done by biologists, not knowledge engineers • Developed by a wide, distributed community • Focused on specific aspects of genomics • Fly-base, yeast, mouse • Used 24/7 from day 1 • Accepted widely by the community • Extended based on use requirements of a wide community • Quite large (30-40K terms)

  23. Jumpstart of Go:Key Decisions (1) • Limited scope • limit domain, though it could have included many many more areas • not let anyone else in until they got somewhere • Added new groups incrementally (10) • 3 related areas • open (no licenses), use open standards • Involve the community • Had to develop own software • control over own code • KISS: keep it simple stupid • E.g., only two relations • Transitivity

  24. Key Decisions (2) • Use it from the beginning • If you wait to have ontology finished before using it you’d never be there • Errors would only be discovered through use • Set things up so that you are OK when you have to fix those errors (entire chunks of ontology had to be entirely redone) • Minimized change impacts by limiting most changes are to rels, which in practice does not impact the annotations • Face-to-face meetings 3-4 times a year • Satisfied a need for DB users that wanted to ask complex queries (1 query to all DBs) • Establish migration path

  25. Key Decisions (3) • Requests are resolved either: • Immediately • Over email if can reach closure over 2-3 days • No voting, only consensus • on agenda for next meeting • Attribution was important • Learned that from Flybase • Both GO content and annotations are annotated with attribution • Unique identifiers within GO • The term can change as a lexical string, but no change in meaning and thus no change in identifier • Can change defn, but not the GO string, then id changes • Small number of relations

  26. Fundamental Ontologies • What is out there? • Not much. • Ontolingua (Stanford University) has a number of small component ontologies • Designed as components • Not tied to applications • DAML is working on fundmental physics ontologies (Jerry Hobbs, SRI International, ISI, Ken Forbus, others) • Time • Space • We would like input from GEON!

  27. Some BIG Questions(from Gene Ontology Workshop) • How do you get started? • How to ensure the community will accept it (use it)? • How do you (can you?) represent alternative views? • What is the process to contribute to it? • What is the process to make changes to it? • What happens when there is an update? • How is it implemented? What tools? • How is it managed? • Who does what, when, where, why?