Leveraging Semantic Web Technologies for Enterprise Data Management

Linked Enterprise Data Leveraging the Semantic Web stackin a corporate environment ISWC 2012 – Boston Fabrice LACROIX – lacroix@antidot.net

Antidot – who we are • French-based Software Vendor • Since 1999 | Paris, Lyon, Aix-en-Provence • Information access | Data management • Mission: Provide our customers with innovative customizable solutions that help them create value with their data, and make their employees more aware and efficient.

Clients Enterprises Publishing E-commerce Healthcare

Unstructured documents • files, ECM, collaborative spaces • intranet, extranet, Web sites • e-mails, instant messaging

Structured data • CRM, ERP, directory • knowledge bases • business applications (production, support)

IS are bloated • 1 practice => 1 need => 1 application => 1 silo • Information system is driven by the process • Data are numerous, various and scattered

Solutions or workarounds? BI MDM SOA Search

Solutions and workarounds • Enterprise Search brings little value to users • Document oriented • Does not solve real business problems Google like Verity like

What we want

What we want ERP CRM Production LDAP ECM Support Files

Changing the paradigm • Switching from an application view to a data centric way of thinking.

Bring out the implicit • Build the Giant Enterprise Graph

LED • Linked Enterprise Data application of the Semantic Web technologies and Linked Data principles to the enterprise infrastructure

What works for the Web… • Federating silos on the Web http://www.w3.org/People/Ivan/CorePresentations/RDFTutorial/Slides.html#(102)

…can’t always be used • in corporate IS • Legacy apps can’t be "Sparql’ed" • 80% un- or semi- structured data don’t fit in the model as such • Defining vocabularies/ontologies for silos is too complex and expensive • Don’t want RDF per se but valuable information • External data is available in XML/JSON through Web Services • Staff trained for RDB, XML, Web apps. • No Risk and stability strategy: SemWeb technology considered as new and immature

The RDF/storage approach • Setting up a global RDF repository does not work either • ITs are afraid by the "RDF everywhere" activists

Semantic Web technology still is the right solution in corporate environment BUT it is not an aim JUST use it as a means

Just do it • Think of it as a stream paradigm • build new objects using existing data • without interfering with the existing infrastructure • with SemWeb somewhere under the hood

Enterprise Graph HowTo • Construct the graph • generate triples from data • create triples from documents • Leverage the graph • enrich • infer • Browse the graph • select resources • build objects • Trash the graph

How: extract & normalize • Harvest and normalize • as in an ETL • fetch, clean, transform… • normalize records (names, IDs) to prepare the linking step • For databases • db2triples : an RDB2RDF implementation by Antidot (open source, W3C validated)

How: semantize • Don’t transform everything in RDF • cherry-pick a subset of interesting fields for each object and create their RDF triples counterpart • interesting == needed for linking or inferring Semantize

How: semantize • Triples generation • Be smart: avoid upfront ontology design, use small vocabularies • Be pragmatic: transform XML tags and field names to predicates • Be agile: only insert what you need. And when you need more, add more. • Semantic Web fuels the modeling, linking and information building process

How: semantize • Unstructured documents • Extract metadata and transform them as needed to RDF. • Ex: author => dc:creator • Use of text-mining to extract named entities: people, organizations, products… • generate those entities list using the data sources: directory for employees, CRM for companies and people, ERP for products • create triples like doc_URI quotes entity_URI

How: semantize • Unstructured documents • Compare documents using various and dedicated algorithms • is the same • is included • is similar • is related • Generates new triples • create triples like <docA> is_sub_version_of <docB>

How: enrich • Enrich the graph • run specific algorithms to generate more links and triples (classifiers, topic detection, …) • insert external data gathered from the LOD or other external datasets or APIs

How: infer • Create new knowledge • add rules according to your needs IF a coworker is quoted in documents AND this coworker belongs to a business unit THEN the business unit is bound to the documents

How: build • Build • select resources corresponding to objects seeds (using Sparql queries) • for each seed, follow links smartly in order to create basic objects Build

How: build • Finalize • decorate the new knowledge objects with data set apart (not loaded in the triplestore) • now we have rich user-actionable objects Build Finalize

How: expose • Make the new information available to users and to the entire IS Enrich Semantize Harvest Relational DB RDF Triplestore (Linked Data) Normalize Classify Annotate AFS search engine Indexation

Conclusion • It works! • The triples we create and the inference rules we add are dictated by the goal / application • usage and value oriented • We benefit from the lazy-flexible-dynamic modeling of RDF-RDFS-OWL • we are agile • What matters is the graph. But the graph is not the triplestore • storage independent

There’s an app for that • Antidot Information Factory • a software solution designed specificallyto leverage structured and unstructured data • enable large-scale processing of existing data • automate publishing of enriched or newly created information. Harvest Normalize Semantize Enrich Build Expose

The Giant Enterprise Graph • Now we have a path to let SemWeb enter the enterprise

Discuss Understand Learn Exchange www.antidot.net info@antidot.net Thanks for your attention QUESTIONS?

Leveraging Semantic Web Technologies for Enterprise Data Management

Leveraging Semantic Web Technologies for Enterprise Data Management

Presentation Transcript

Linked Library Data

Creating Linked Data

Introducing Linked Data

Linked Data Visualizations for Eurostat Linked Data

Linked Data

Linked Data

Linked Data

Linked data

Toward linked data:

Linked Data

Linked Government Data

Linked Library Data

Linked Data at present Using Linked Data

Linked Data

Linked Data @ NLB

Linked Data browsers

Linked Library Data

Linked Data browsers

Linked Data

Linked Data Structures