1 / 33

The complexity of biodiversity knowledge

The complexity of biodiversity knowledge. Andrew C. Jones Cardiff University Andrew.C.Jones@cs.cardiff.ac.uk Malcolm Scoble The Natural History Museum M.Scoble@nhm.ac.uk. Purpose of talk. Malcolm & Andrew are both investigators in BiodiversityWorld (BDW)

neo
Télécharger la présentation

The complexity of biodiversity knowledge

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The complexity of biodiversity knowledge Andrew C. Jones Cardiff University Andrew.C.Jones@cs.cardiff.ac.uk Malcolm Scoble The Natural History Museum M.Scoble@nhm.ac.uk

  2. Purpose of talk • Malcolm & Andrew are both investigators in BiodiversityWorld (BDW) • There are many problems BDW doesn’t solve yet … • … and the funding runs out tomorrow! • We’ll present • BiodiversityWorld as a framework to support biodiversity research • Other projects in which biodiversity informatics problems have been addressed individually • Major challenge: draw these disparate efforts together Jones & Scoble, Semantic Interop., Imperial Coll, 30/03/06

  3. Part 1(Andrew Jones)

  4. Why Biodiversity Informatics is hard • Need to integrate data & tools of different kinds for interesting “in silico” analyses • Various computer science issues, e.g. • Human-Computer Interaction • Design of environments to support scientific research • Interoperability • Complexity & heterogeneity of data • Differences of scientific opinion • Data quality problems Jones & Scoble, Semantic Interop., Imperial Coll, 30/03/06

  5. The BiodiversityWorld project • 3 year e-Science project funded by BBSRC • Partners: The University of Reading, Cardiff University, The Natural History Museum, Southampton University • Aim: • Build a Biodiversity Grid(Problem Solving Environment to support Biodiversity research) • Support discovery & use of arbitrary tools & data sources for interesting in silico experiments • Provide environment to get beyond the ‘cutting and pasting into Word documents’ approach to data integration and analysis Jones & Scoble, Semantic Interop., Imperial Coll, 30/03/06

  6. Example problems for BiodiversityWorld • How should conservation efforts be concentrated? • (example of Biodiversity Richness & Conservation Evaluation) • Where might a species be expected to occur, under present or predicted climatic conditions? • (example of Bioclimatic & Ecological Niche Modelling) • How can geographical information assist in selection among possible phylogenetic trees? • (example of Phylogenetic Analysis & Palaeoclimate Modelling) Jones & Scoble, Semantic Interop., Imperial Coll, 30/03/06

  7. BiodiversityWorld architecture User interface Presentation Workflow enactment Native Wrapped engine Metadata Biodiversity - resources World repositor y Resources BGI API BiodiversityWorld - GRID Interface (BGI) The GRID Jones & Scoble, Semantic Interop., Imperial Coll, 30/03/06

  8. Jones & Scoble, Semantic Interop., Imperial Coll, 30/03/06

  9. Jones & Scoble, Semantic Interop., Imperial Coll, 30/03/06

  10. Some problems not fully solved in BDW • Flexible data access • BGI designed to make BDW maintainable, but currently assumes each resource has a predefined set of operations • BioDA project investigated use of OGSA-DAI in BDW • HCI issues • A much more exploratory approach to workflow construction might be appropriate? • Semantic interoperability & data quality • Metadata repository: basic information only • Only basic solution to species naming problems (SPICE) • Other problems of descriptive terms, differences of expert opinion, etc., remain to be addressed Jones & Scoble, Semantic Interop., Imperial Coll, 30/03/06

  11. Complexity of biodiversity data: a multi-dimensional problem • Same specimen might be described with differences of: • Terminology • Opinion about identification • Opinion about whether a particular feature is present • Accuracy • Experts may differ as to: • Circumscription associated with a given scientific name • (So may not be describing the same concept) • Terminology used to describe a given taxon • Accepted name for a species in a taxonomic checklist • There may be errors! • ... Jones & Scoble, Semantic Interop., Imperial Coll, 30/03/06

  12. SPICE for Species 2000 • BBSRC/EPSRC- and EU-funded • SPecies 2000 Interoperability Co-ordination Environment • Aims: • build scalable, federated scientific name catalogue organised by taxon (species, etc.) • provide ‘synonymy server’, enriching information retrieval • Issue: how to build an architecture to integrate specialist, heterogeneous databases, providing a consistent federated view of broader scope? • Common Data Model sufficed … • data requirements of federation identical for each database • small set of ‘canned queries’ adequate for the catalogue Jones & Scoble, Semantic Interop., Imperial Coll, 30/03/06

  13. User (Web browser) User (Web Browser) …… CORBA User Server module (HTTP) CAS knowledge repository (taxonomic hierarchy, annual checklist, genus and other caches, ...) Common Access System (CAS) ‘Query’ co-ordinator …… Wrapper (e.g.CGI/XML+ ODBC) Wrapper (e.g. JDBC) (in some cases, generic) CORBA ‘wrapper’ element of GSD Wrapper GSD GSD SPICE internal architecture Jones & Scoble, Semantic Interop., Imperial Coll, 30/03/06

  14. LITCHI • BBSRC/EPSRC- and EU- funded • Logic-based Integration of Taxonomic Conflicts in Heterogeneous Information systems • Aim: detect conflicts between species checklists and either • Assist in producing a consistent checklist, or • Generate correspondences between checklists (‘cross-map’) • Addressing problems of species classification & naming variations when accessing species-related data • More general, semantic interoperability issue: • detecting conflicts between different expert views of same subject matter; • supporting data access based on these views Jones & Scoble, Semantic Interop., Imperial Coll, 30/03/06

  15. Checklist 1 Caragana arborescens Lam. (accepted name) Caragana sibirica Medikus (synonym) Checklist 2 Caragana sibirica Medikus (accepted name) Caragana arborescens Lam. (synonym) (“Lam.” = “Lamark”) LITCHI example “A full name which is not a pro-parte name may not appear as both an accepted name and a synonym in the same checklist” Jones & Scoble, Semantic Interop., Imperial Coll, 30/03/06

  16. Name relationships (LITCHI 2) Jones & Scoble, Semantic Interop., Imperial Coll, 30/03/06

  17. myViews • Not funded yet – limited proof-of-concept prototype only • Addresses problem that an expert may wish to generate taxon descriptions which are: • Coherent; • Mapped explicitly to other taxon descriptions, and • Based directly on existing documentation (monographs, etc), rather than completely re-coded in some restrictive formalism with a new vocabulary Jones & Scoble, Semantic Interop., Imperial Coll, 30/03/06

  18. Example: describing the same things? • Description A: • Sarothamnus scoparius (L.) Wimm. ex Koch. • Broom • ... a bush which is 50-200 cm high ... • Description B: • Cytisus scoparius • Yellow broom • ... a small shrub up to 6ft or more ... native in its yellow form ... • Description C: • Cytisus scoparius (L.) Link. • Broom • ... a deciduous shrub growing to 2.4m by 1m at a fast rate ... scented flowers ... • Description D: • Common Broom • Cytisus scoparius • ... covered in profuse golden-yellow flowers ... shrub about 1-3m tall ... • Description E: • Broom • Cytisus scoparius • ... Like a spineless edition of gorse ... with larger scentless flowers ... • Similar problems apply to individual specimen descriptions Jones & Scoble, Semantic Interop., Imperial Coll, 30/03/06

  19. Things we might want to do • In a system where • data is held in as ‘raw’ a form as possible, to avoid information loss, but • we can impose various views and hypotheses we might wish to … • Create our own ‘view’ of the data • For a given piece of knowledge, we could • accept it unaltered • accept but re-express in our terms (e.g. different scientific name; different units; ...) • state it is equivalent to another piece of knowledge(e.g. minor differences in measurements) • flag it as ‘wrong’ • ... • In relation to another’s view, we might • include or ignore it • declare some ‘mapping’ applicable to a group of items(e.g. every species of ‘Sarothamnus’ is mapped to ‘Cytisus’) • ... • Reason with differing levels of precision simultaneously (e.g. binary/continuous characters derived from same features) Jones & Scoble, Semantic Interop., Imperial Coll, 30/03/06

  20. An experimental prototype • Proof of concept ... • arbitrary, small data set from various sources: Cytisus & Genista species • No real ‘front end’ or ‘back end’ yet! • Implemented in Prolog (a logic programming language) • Formalisms to record complex assertions & their sources • Ontological knowledge not currently separated out explicitly; rules perform inference • User makes his/her own assertions about (for example) • synonymy; • which assertions of others to accept; • ... • ... both very specific and more general rules • Main purpose: illustrate handling multiple opinions/hypotheses Jones & Scoble, Semantic Interop., Imperial Coll, 30/03/06

  21. assertion(1, association(2, 3, absent(scent(flowers)))). assertion(1, property(2, yellow(flowers))). assertion(1, label(2, common('Broom'))). assertion(1, label(2,species('Cytisus', 'scoparius'))). assertion(4, property(5, shrublet(whole))). assertion(4, property(5, deciduous(whole))). assertion(4, property(5, size(6, in, whole))). assertion(4, property(5, deep_yellow(flowers))). assertion(4, property(5, small(leaves))). assertion(4, label(5,species('Cytisus', 'ardoinii'))). assertion(4, property(7, size(6, ft, whole))). assertion(4, label(7,species('Cytisus', 'scoparius'))). assertion(12, label(13, common('Broom'))). assertion(12, label(13,common('Scotch Broom'))). assertion(12, property(13, compound('sparteine'))). assertion(12, property(13, compound('tyramine'))). assertion(12, label(13,species('Sarothamnus', 'scoparius'))). assertion(14, label(15,species('Sarothamnus', 'scoparius'))). assertion(14, property(15,size_range(50, 200, cm, whole))). assertion(14, property(15, bright_yellow(flowers))). assertion(16, label(17,species('Cytisus', 'scoparius'))). assertion(16, property(17,max_height(2.4, m, whole))). assertion(16, property(17,max_width(1, m, whole))). assertion(16, property(17, present(scent(flowers)))). assertion(8, property(9, golden_yellow(flowers))). assertion(8, property(9,size_range(1, 3, m, whole))). assertion(8, label(9,species('Cytisus', 'scoparius'))). Sample knowledge base extracts Source 12 asserts that item 13’s label is common name ‘Scotch Broom’

  22. Deducing from the knowledge base ?- display_accepted_props('Cytisus', 'ardoinii'). shrublet(whole) deciduous(whole) size(6, in, whole) deep_yellow(flowers) small(leaves) Yes ?- display_accepted_props('Cytisus', 'scoparius'). yellow(flowers) size(6, ft, whole) golden_yellow(flowers) size_range(1, 3, m, whole) max_height(2.4, m, whole) max_width(1, m, whole) present(scent(flowers)) absent(spines) absent(scent(flowers)) Yes ?- display_contradictions_for('Cytisus', 'scoparius'). [present(scent(flowers)), absent(scent(flowers))] Yes Jones & Scoble, Semantic Interop., Imperial Coll, 30/03/06

  23. Adding synonymy (1) • User regards any statement about a Sarathamnus species as being a statement about a Cytisus species with same epithet: • assertion(20,synonym(species('Cytisus', Epithet), _, species('Sarothamnus', Epithet), _)). • (Could be more restrictive, e.g. apply to only particular information sources) Jones & Scoble, Semantic Interop., Imperial Coll, 30/03/06

  24. Adding synonymy (2) ?- display_accepted_props('Cytisus', 'scoparius'). yellow(flowers) size(6, ft, whole) golden_yellow(flowers) size_range(1, 3, m, whole) max_height(2.4, m, whole) max_width(1, m, whole) present(scent(flowers)) compound(sparteine) compound(tyramine) size_range(50, 200, cm, whole) bright_yellow(flowers) absent(spines) absent(scent(flowers)) Yes ?- display_contradictions_for('Cytisus', 'scoparius'). [size_range(1, 3, m, whole), size_range(50, 200, cm, whole)] [present(scent(flowers)), absent(scent(flowers))] Yes Jones & Scoble, Semantic Interop., Imperial Coll, 30/03/06

  25. Some important issues for future work • Complexity, e.g. • Trade-off: effective resource discovery v. computational expense of traversing rich ontology • Scalability of taxonomic conflict detection • May find large data sets need clever techniques such as Rete network • Scalability of inference in myViews; caching inferred information • Managing & ranking large result sets • How to rank resources discovered • How to rank conflicts to present users with matches they are likely to want • Joining all these fragmentary projects up together Jones & Scoble, Semantic Interop., Imperial Coll, 30/03/06

  26. Part 2(Malcolm Scoble)

  27. The complexity of taxonomic/biodiversity data Specimen (unit) data Collection-level Species/taxon concept Locality Species name DNA barcodes Species concepts Observations Date of description Synonyms Type specimen Genus name (for binomial) Date of specimen collection Time of specimen collection Images Name of collector Homonyms Author of taxon Jones & Scoble, Semantic Interop., Imperial Coll, 30/03/06

  28. Taxonomy: from a ‘fragmented’ to a ‘distributed’ resource Where we want to be • Less fragmented; single site or distributed access • Easier to update • Coordinated effort • Electronic (or dual) medium • Free access to data • Taxonomy easier to use Where we are now • Fragmented results • Fragmented effort • Largely a paper medium (restricted access) Jones & Scoble, Semantic Interop., Imperial Coll, 30/03/06

  29. Projects to integrate biodiversity data • BioCISE (collection-level) • ENHSIN (specimen (unit)-level) • BioCASE (unit- & collection-level) • Species 2000 (species nomenclature) • SYNTHESYS (taxonomic infrastructure) • ENBI (network of biodiversity information) • EDIT (distributed approach to taxonomy) • PBIs (inventorying the planet’s biodiversity) • CATE: Creating a Taxonomic e-Science Jones & Scoble, Semantic Interop., Imperial Coll, 30/03/06

  30. BioCASE National Node Network Collection-level • 31 National Nodes • Core Meta Database is updated every night Jones & Scoble, Semantic Interop., Imperial Coll, 30/03/06

  31. All levels A Biological Collections Service for Europe Jones & Scoble, Semantic Interop., Imperial Coll, 30/03/06

  32. Jones & Scoble, Semantic Interop., Imperial Coll, 30/03/06

  33. Creating a taxonomic e-science (CATE) • Literature scattered over 250 years of paper publications. • Data inaccessible other than to specialist users • Aim to transfer in toto the taxonomy of two groups of organisms to the web(Hawkmoths and Aroids). • Broad aim: to encourage migration of taxonomy to the web. • Provide data for those studying biodiversity. • Encourage quality control, peer-review and the development of “consensus” taxonomies in the web environment. • Develop means of citation for web-based revisions The Hawkmoth Sphinx caligineus sinicus from Beijing, China. Photo: Tony Pittaway Arisaema candidissimum Photo : RBG Kew

More Related