370 likes | 489 Vues
This presentation discusses the application of Semantic Web technologies within corporate environments, focusing on how enterprises can benefit from transitioning to a data-centric approach. It examines the challenges and solutions associated with integrating unstructured and structured data, the creation of an Enterprise Graph, and strategies for generating and leveraging RDF triples from existing data. The goal is to enhance information accessibility, efficiency, and value creation while maintaining current infrastructures. Practical insights and methodologies are provided for enterprises looking to innovate their data management practices.
E N D
Linked Enterprise Data Leveraging the Semantic Web stackin a corporate environment ISWC 2012 – Boston Fabrice LACROIX – lacroix@antidot.net
Antidot – who we are • French-based Software Vendor • Since 1999 | Paris, Lyon, Aix-en-Provence • Information access | Data management • Mission: Provide our customers with innovative customizable solutions that help them create value with their data, and make their employees more aware and efficient.
Clients Enterprises Publishing E-commerce Healthcare
Unstructured documents • files, ECM, collaborative spaces • intranet, extranet, Web sites • e-mails, instant messaging
Structured data • CRM, ERP, directory • knowledge bases • business applications (production, support)
IS are bloated • 1 practice => 1 need => 1 application => 1 silo • Information system is driven by the process • Data are numerous, various and scattered
Solutions or workarounds? BI MDM SOA Search
Solutions and workarounds • Enterprise Search brings little value to users • Document oriented • Does not solve real business problems Google like Verity like
What we want ERP CRM Production LDAP ECM Support Files
Changing the paradigm • Switching from an application view to a data centric way of thinking.
Bring out the implicit • Build the Giant Enterprise Graph
LED • Linked Enterprise Data application of the Semantic Web technologies and Linked Data principles to the enterprise infrastructure
What works for the Web… • Federating silos on the Web http://www.w3.org/People/Ivan/CorePresentations/RDFTutorial/Slides.html#(102)
…can’t always be used • in corporate IS • Legacy apps can’t be "Sparql’ed" • 80% un- or semi- structured data don’t fit in the model as such • Defining vocabularies/ontologies for silos is too complex and expensive • Don’t want RDF per se but valuable information • External data is available in XML/JSON through Web Services • Staff trained for RDB, XML, Web apps. • No Risk and stability strategy: SemWeb technology considered as new and immature
The RDF/storage approach • Setting up a global RDF repository does not work either • ITs are afraid by the "RDF everywhere" activists
Semantic Web technology still is the right solution in corporate environment BUT it is not an aim JUST use it as a means
Just do it • Think of it as a stream paradigm • build new objects using existing data • without interfering with the existing infrastructure • with SemWeb somewhere under the hood
Enterprise Graph HowTo • Construct the graph • generate triples from data • create triples from documents • Leverage the graph • enrich • infer • Browse the graph • select resources • build objects • Trash the graph
How: extract & normalize • Harvest and normalize • as in an ETL • fetch, clean, transform… • normalize records (names, IDs) to prepare the linking step • For databases • db2triples : an RDB2RDF implementation by Antidot (open source, W3C validated)
How: semantize • Don’t transform everything in RDF • cherry-pick a subset of interesting fields for each object and create their RDF triples counterpart • interesting == needed for linking or inferring Semantize
How: semantize • Triples generation • Be smart: avoid upfront ontology design, use small vocabularies • Be pragmatic: transform XML tags and field names to predicates • Be agile: only insert what you need. And when you need more, add more. • Semantic Web fuels the modeling, linking and information building process
Enterprise Graph HowTo • Construct the graph • generate triples from data • create triples from documents • Leverage the graph • enrich • infer • Browse the graph • select resources • build objects • Trash the graph
How: semantize • Unstructured documents • Extract metadata and transform them as needed to RDF. • Ex: author => dc:creator • Use of text-mining to extract named entities: people, organizations, products… • generate those entities list using the data sources: directory for employees, CRM for companies and people, ERP for products • create triples like doc_URI quotes entity_URI
How: semantize • Unstructured documents • Compare documents using various and dedicated algorithms • is the same • is included • is similar • is related • Generates new triples • create triples like <docA> is_sub_version_of <docB>
Enterprise Graph HowTo • Construct the graph • generate triples from data • create triples from documents • Leverage the graph • enrich • infer • Browse the graph • select resources • build objects • Trash the graph
How: enrich • Enrich the graph • run specific algorithms to generate more links and triples (classifiers, topic detection, …) • insert external data gathered from the LOD or other external datasets or APIs
How: infer • Create new knowledge • add rules according to your needs IF a coworker is quoted in documents AND this coworker belongs to a business unit THEN the business unit is bound to the documents
Enterprise Graph HowTo • Construct the graph • generate triples from data • create triples from documents • Leverage the graph • enrich • infer • Browse the graph • select resources • build objects • Trash the graph
How: build • Build • select resources corresponding to objects seeds (using Sparql queries) • for each seed, follow links smartly in order to create basic objects Build
How: build • Finalize • decorate the new knowledge objects with data set apart (not loaded in the triplestore) • now we have rich user-actionable objects Build Finalize
Enterprise Graph HowTo • Construct the graph • generate triples from data • create triples from documents • Leverage the graph • enrich • infer • Browse the graph • select resources • build objects • Trash the graph
How: expose • Make the new information available to users and to the entire IS Enrich Semantize Harvest Relational DB RDF Triplestore (Linked Data) Normalize Classify Annotate AFS search engine Indexation
Conclusion • It works! • The triples we create and the inference rules we add are dictated by the goal / application • usage and value oriented • We benefit from the lazy-flexible-dynamic modeling of RDF-RDFS-OWL • we are agile • What matters is the graph. But the graph is not the triplestore • storage independent
There’s an app for that • Antidot Information Factory • a software solution designed specificallyto leverage structured and unstructured data • enable large-scale processing of existing data • automate publishing of enriched or newly created information. Harvest Normalize Semantize Enrich Build Expose
The Giant Enterprise Graph • Now we have a path to let SemWeb enter the enterprise
Discuss Understand Learn Exchange www.antidot.net info@antidot.net Thanks for your attention QUESTIONS?