1 / 20

Semantic Web Technologies on HPC for Life Sciences and Other Domains

Semantic Web Technologies on HPC for Life Sciences and Other Domains. Sean Martin Founder & CTO Cambridge Semantics Inc. sean@cambridgesemantics.com +1 617 606 341. Semantic Web Technologies on HPC for Life Sciences and Other Domains. Sean Martin Founder & CTO Cambridge Semantics Inc.

edolie
Télécharger la présentation

Semantic Web Technologies on HPC for Life Sciences and Other Domains

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Semantic Web Technologies on HPC for Life Sciences and Other Domains Sean Martin Founder & CTO Cambridge Semantics Inc. sean@cambridgesemantics.com +1 617 606 341

  2. Semantic Web Technologies on HPC for Life Sciences and Other Domains Sean Martin Founder & CTO Cambridge Semantics Inc. sean@cambridgesemantics.com +1 617 606 341

  3. What is/are Semantic Technologies anyway? Semantics (from Greeksēmantiká, neuter plural of sēmantikós)is the study of meaning. 10 Semantics experts in a room = 11 opinions

  4. Little “s” semantics • Usually proprietary, mostlyheuristics/statistics based • Search (not query) • Usually extract meaning from unstructured data (text/video etc) • Examples: • or • Enterprise search e.g. or • Entity extraction, automated tagging, text analytics • Natural Language Processing Technologies (NLP) • Automated Translation e.g. Google Translate • SMILA & UIMA open source frameworks

  5. Big “S” Semantics – Paint starting to dry • W3C recommendations (open data standards) • Machine readable, query (not search) & instant data integration • The Semantic Web • Also known as “Linked Open Data” • Also known as “Web 3.0 • Examples: • Google “rich snippets” • OpenGraph • The Good Relations Ontology e.g. • PublicGovernment Data (USA, Europe, UK) • All sorts of startup activity

  6. What are the W3C’s Open Data Standards? • RDF • OWL • SPARQL There are others, but these are the key ones

  7. RDF • Self describing (tagged) instance data • Facts or Triples : <subject> <predicate> <Object/Value> • Collections of triples creates a directed labeled graph <subject> and <predicate> are globally unique strings or URIs e.g. http://www.cambridgesemantics.com/people/sean

  8. OWL • OWL (Web Ontology Language) • Describe data models in a way that domain expert would • What triples or facts are needed to properly describe something and its relationship to other similarly described things? • Relationships for inference and other kinds of reasoning

  9. SPARQL • The first standards based distributed query language for RDF data & the Web • Wow!

  10. Important properties of RDF • Machine readable model / programs can “understand” • Unique Identity of every data element • Subject is a unique identifier • Predicates (the relationship) is also a unique identifier • Object can be a unique identifier pointing to another subject • That’s how we get directed graphs • Allows annotation (the unique subject string provides an “anchor” for 3rd party metadata) • Allows provenance (especially useful when data travels beyond its source system or needs to be updated) • Semantic Type (not just primitive data types) • Lets programs immediately know what type of data they are dealing with, allowing automated contextualization of information

  11. So what does any of this change? • Adoption of the semantic standards will be disruptive in at least two ways that create enormous value • Who can do what. Much easier. • Pushing the bar further and further towards end user self-service 2. How long it takes. Much faster. • Each new wave of technology brings at least an order of magnitude productivity increases, often more Recent waves: Web Services/SOA; Java (no memory management); Virtualization etc. • Semantic technology is another wave

  12. Where do these benefits come from? • Using Semantic Technologies, the end users understanding of their data need be the only system or application model required • This allows the construction of applications & systems to move from what have until now been carefully planned, structure dependent “all up front” designs over to malleable conceptual representations that can be evolved quickly • Systems go from being brittle to flexible • Systems can change at the speed the business does • End Users can increasingly make more of these changes directly themselves

  13. Preserving the end users model Traditional middleware Semantic middleware Users Model* *Warning: dramaticallyover simplified to make a point • Relational Model Physical • Relational Model Logical • Object Relational Model • Business Objects Model • User Interface Model • Users idea of the Model

  14. Paying the price for all this flexibility • Exploding data volumes • tagging creates 10x more data • Random Access is expensive • >35 Years of optimization around RDBMS is not helping • too many “self-joins” on a three column table • No index support • Adding an additional layer of indirection is expensive • every time you want to display a value you need to dereference it

  15. Paying the price for all this flexibility – enabling trends • W3C Semantic standards • A decade of semantic middleware+storageR&D • Multi-core CPUs • Fast networking • Cheap RAM • Web 2.0 blazing the trail with a new RAM based application model? Disk is the new tape? Twitter, Facebook, LinkedIn and iostat • SSD • The changing cost of the sub 4k random access read and what it means to transaction processing systems and the applications that run on them

  16. Spot the difference Then.. Now

  17. And finally, so what does any of this have to do with HPC? • Cray’s XMT Systems + Very large quantities of RAM arranged in a contiguous block + Very low latency memory access + Large number of CPUs + Large number of cheap threads = Full pipelines • Great for interactive applications creating random access queries patterns, particularly complex ones requiring many joins

  18. Other HPC related Semantic efforts • Raytheon BNN’s SPARQL on MapReduce clusters • WebPie – VU University of Amsterdam’s OWL Horst Inference on MapReduce • Clustered RDF triple stores • Open Link’s Virtuosa data store • Ontotext’s Big OWLIM • Franz Inc’sAllegroGraph

  19. Semantics & the Enterprise – not waiting for the network effect Overview of Cambridge Semantics Middleware Platform • Allow business users & customers /partners to: • Discover & connect to any data in databases & other systems on the fly • Create dashboards & applications on demand • Allows IT to: • Rapidly integrate data across silos and firewalls • Expose business policies, rules & workflow to business users • Implement manual intervention with automated response • Enterprise-class security, governance, provenance, … A W3C-based semantic middleware for real-time user driven operational intelligence

  20. Thanks for listening • Further Interest and a completely different view • Sir Tim Berners-Lee’s TED Talk on the next web • Questions/Objections? • Stop me & ask/state • Contact details again Sean Martin Cambridge Semantics Inc. sean@cambridgesemantics.com +1 617 606 341

More Related