1 / 25

Data Integration: Using TAPIR as an asynchronous caching protocol

Data Integration: Using TAPIR as an asynchronous caching protocol. Aaron Steele asteele@berkeley.edu University of California at Berkeley Museum of Vertebrate Zoology. Application. Network. Application. Network. Application. Network. Application. Cache. Network. Application. Cache.

Télécharger la présentation

Data Integration: Using TAPIR as an asynchronous caching protocol

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Integration: Using TAPIR as an asynchronous caching protocol Aaron Steele asteele@berkeley.edu University of California at Berkeley Museum of Vertebrate Zoology

  2. Application Network

  3. Application Network

  4. Application Network

  5. Application Cache Network

  6. Application Cache Network

  7. Application Cache Network

  8. Application Cache Network

  9. Network

  10. “Nanos Gigantium HumerisInsidentes.” - Issac Newton

  11. How Can Google Help? • Google Base • Google Subscribed Links

  12. Google Base • Submit record metadata: form, bulk, API • Google creates your data index • Query data using Google Base protocol • Search results link back to your data • Track usage statistics • Change or delete metadata • No storage or transmission limits • Check the TOS for details

  13. BioCase & TAPIR Adaptersto Google Base Application Adapter Google Base (cache)‏

  14. Google Subscribed Links • “Add custom search results to Google” • You define query, result format, result link • Dynamic! Supply XML, TSV or RSS feeds • Include images or gadgets (maps, etc)‏ • Users subscribe to your links

  15. A Word about Citations • A link is essentially a citation • Search results from Google Base and Subscribed Links return pointers (links) to your data, not the actual data

  16. Application Network

  17. Application SQL Cache TAPIR Protocol Network

  18. Data HarvestingSoftware • Java 1.5, Eclipse, dom4j, Hibernate, MySQL • XML configuration • Resource access points and a global set of filtered concepts to cache • HigherGeography = Madagascar • Class = Aves OR Class = Reptilia • CoordinateUncertaintyInMeters != null • Harvest via TAPIR inventory requests (KVP)‏ • Paged inventories were handled with an Inventory class that implemented the Iterator interface

  19. Application SQL DwC Cache TAPIR Protocol Update Feeds Network

  20. Data Synchronization Implementation • Network records added, removed, changed • Cache must reflect these changes • PHP 5, SQLite application • Register resources • Generates Atom & RSS GUID update feeds • Compares successive copies of GUID-DLM inventories: • if new GUID detected, record INSERT • if DLM changed, record UPDATE • if old GUID missing, record DELETE

  21. Application SQL DwC Cache TAPIR Protocol Update Feeds Network

  22. HerpNET Proof of Concept • Class = Reptilia OR Class = Amphibia • CoordinateUncertaintyInMeters != null • 20/80 providers accessible via TAPIR • 200k cached georeferenced records • AmphibiaWeb synonmy lookup on scientific name using synonmy server, each synonmy name looked up in cache for coordinates, then results mapped using BerkeleyMapper • Query times reduced to 5ms from 15s

  23. Future Work • ReBioMa Project • Funded by MacArthur • Dynamic SDM for Madagascar • MaxEnt using cached records from TAPIR providers georeferenced by BioGeomancer • New models fitted and projected when cache updates

More Related