1 / 37

Intelligent Information Management

Intelligent Information Management . Collaborative Project 2010-2014 in Information and Communication Technologies Project No. 257943 Start Date 01/09/2010. The emerging Web of Data achievements and challenges. Achievements

juan
Télécharger la présentation

Intelligent Information Management

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Intelligent Information Management • Collaborative Project 2010-2014 • in Information and Communication Technologies • Project No. 257943 • Start Date 01/09/2010

  2. The emerging Web of Data achievements and challenges • Achievements Extension of the Web with a data commons (currently amounting 25 Billion facts) vibrant, global RTD community Industrial uptake begins (e.g. BBC, Thomson Reuters, Eli Lilly) Emerging governmental adoption in sight Establishing Linked Data as a deployment path for the Semantic Web. • Web - a global, distributed platform for data, information and knowledge integration • exposing, sharing, and connecting pieces of data, information, and knowledge on the Semantic Web using URIs and RDF • Challenges • Coherence: Relatively few, expensively maintained links • Quality: partly low quality data and inconsistencies • Performance: Still substantial penalties comparedto relational • Data consumption: large-scale processing, schema mapping and data fusion still in its infancy • Usability: Missing direct end-user tools and network effect • These issues are closely related and should ultimately lead to an ecosystem of interlinked knowledge! April 2008 July 2007 September 2008 July 2009

  3. LOD2 in a Nutshell • Research focus • Very large RDF datamanagement • Knowledge Enrichment &Interlinking • Fusion & InformationQuality • Adaptive, semantic user interfaces • Use Cases • Media & Publishing • Enterprise Data Webs • Open Gov Data • Main Result • Integrated LOD2-Stack for Linked Data lifecycle management • Partner • Uni Leipzig, CWI, DERI Galway, FU Berlin, Semantic Web Company, OpenLink, Tenforce, Exalead, Wolters Kluwer, OKFN

  4. LOD2EC-fundedcollabarotiveprojectthataimstoutilizethe Web as an integrationplatformfordataandinformation Linked DataLinked Data providesthenecessarybasictechnologiesandstandardstorealizethegoalof LOD2. Linked Open Datapubliclyaccessibledatawhichistobeintegratedintothe web andlinkedamongoneanotherandwith non-publiccontents such asenterpriseintranets Project Highlights Open GovernmentLinked Data Initiative Common European platformpublicdata.eu LeadingWeb 3.0 technologiesarecombined in theprojectin tothecoherent LOD2 stack (e.g. DBpedia, Virtuoso, Sindice, Silk)

  5. WP1: Requirements, Design & LOD2 Stack Prototype Use Case High-Level Abstraction

  6. WP1: Use Case Objectives Objective of WP8: Applying Linked Data technologies in an enterprise stack to support Human Resources-related issues. ENTERPRISEAPPLICATIONS (Exalead) MEDIA & PUBLISHING (Wolters Kluwer Germany) OPEN GOVERNMENT DATA (Open Knowledge Foundation) Objective of WP7: Supporting content-related production workflows in the media & publishing industry. Objective of WP9: Improving accessibility, findability and reusability of Open Government Data.

  7. WP2: Storing & Querying Very Large Knowledge Bases • Goal: • Enabling large-scale, feature-rich & enterprise-ready Linked Data management solutions • Database Partners in LOD2: • CWI - Leading open source analytics RDBMS • OpenLink - Leading Linked data deployment platform • Technological Excellence: • Creating and publishing metrics for choosing RDF solutions • Bringing Column Store Technology for Business Intelligence on RDF • Ground-breaking database innovations for RDF stores (Dynamic Query optimization, Adaptive Caching of Joins, Optimized Graph Processing, Cluster/Cloud scalability)

  8. WP2: Linked Open Data For Real In Your Apps • Business Advantages: • Enrich your application with (free & rich) Linked Open Data • RDF store technology has 10x lower deployment costs than relational for ragged data • Technological Flexibility: • Deliver Schema-Last Flexibility and Inference at Relational Data Warehouse Cost and Performance • Grow as you go: the LOD2 platform dynamically adapts to your usage patterns and structure of your data • Integrate, resolve, align anything: Schema, instance identity • Rich Features for complex Applications: • Advanced SPARQL and SQL query processing • SPARQL and SQL Federation • Full Text, Geospatial, Text Search • Scale-Out on Clusters, Replication

  9. WP3: Goals General Goal: • Creation, improvement, repairofknowledgebases Focus: • Very large knowledgebases, diverse knowledge, web data • Refineexisting (VirtuosoSponger, RDF Views, Triplify, D2R) triplificationapproaches • Improveschemaofknowledgebased on data • Fix problems in knowledgebases e.g. inconsistencies Techniques: • semi-automaticmachinelearning, ontologydebugging, NLP, shallowparsing etc.

  10. Repair Enrichment Inconsistency Modelling Problems Disjoint-ness WP3: Knowledge Base Improvement Cycle Definitions Performance Problems Extraction Linkage Validation Mutual Refinement Cycle (with optional Extraction phase) Structured Semi-structured Un- structured

  11. WP3: Task Overview • Provenance-Aware Extraction of Linked Data from Existing Structured Formats • Relational databases, spreadsheets, CMS, logs, XML documents • Development of D2R Triplify, Virtuoso Sponger • Provenance-Aware Extraction of Linked Data from Unstructured and Semi-Structured Sources • HTML, PDF and Office documents with meta-data, wiki code, plain text • Development of NLP2RDF and DBpedia • Knowledge Base Schema Enrichment • Learn axioms in knowledge bases, e.g. disjointness, definitions, super-classes • Development of ORE and DL-Learner • Knowledge Base Repair • Fix inconsistencies, modeling problems, reasoning performance problems • Development of ORE • Web Linkage Validator • Reports whether knowledge base is suitable to be interlinked with others

  12. WP4: Reuse, Interlinking and Knowledge Fusion (1) Goal: Provide open-source software components for link generation, schema mapping, data quality assessment and knowledge fusion.

  13. WP4: Reuse, Interlinking and Knowledge Fusion (2) • Technological Excellence: • Ease the creation of RDF links by using machine-learning as well as link quality assessment workbench • Provide for the flexible integration of Web data based on mappings discovered on the Web • Provide for assessing the quality of Web data and fusing high-quality data. • Expected Outcomes: • Link discoverytools, linkingassistandworkbench • Framework for publishing and discovering expressive mappings on the Web • Data qualityassessmentframeworkprovidingfor a widerangeof different qualityassessmentpolicies • Data fusioncomponentsprovidingvariousconflictresolutionstrategies

  14. WP 5:Linked Data Visualization, Browsing and Authoring (1) WP5 aims to build on and go beyond existing approaches for realizing adaptive Web user interfaces by: • Automatic content adaptation A subset of the domain knowledge is identified, possibly through some reasoning mechanism, as relevant to the current user and context. • Adaptive Browsing Faceted spatial semantic browsing: reusable component for browsing spatial content in a faceted way (also for mobiles).

  15. WP5: Linked Data Visualization, Browsing and Authoring (2) Adaptive Semantic Authoring • semantic widget interface: allows the creation of small reusable interface components for domain-specific user interfaces (also for mobiles). • adaptive widget choreography: enables the automatic generation of user interfaces. • social networking interfaces: enable users to subscribe to arbitrary information adhering to certain semantically defined filter criteria.

  16. WP 5: Technologies & Methods (1) • Semantic pipes: An engine and graphical environment for general Web Data transformations and Mashup. • Sig.ma: A service and an end-user application to access the Web of Data as an integrated information space. 

  17. WP 5: Technologies & Methods (2) Site Services: Site Search and Site Widgets • Widgets (right) provide relevant information, from Sindice, about the topic of the site. • Site search (below) provides a rich faceted-browsing functionality of the site’s widgets.

  18. WP6: Interfaces, Integration & LOD2 Stack (1) • This work package deploys the LOD2 stack, based on the requirements and prerequisites defined in work package 1. • The LOD2 stack will be made available as downloadable packages. • While leading work package 6, the following will be delivered: • Integrated user-interface components • Integrated LOD2 Stack API components • Evaluation, Documentation, Tutorials

  19. WP6: Interfaces, Integration & LOD2 Stack (2) Output WP1 Use Case Media & Publishing WP7: Media & Publishing Yearly releases Requirements/ Prerequisites Applied on Use Case Enterprise Data web Open Source Package WP8: Media & Publishing Generates packages with integrated tools Applied on Requirements/ Prerequisites SAF (Software Assembly Factory) Starts on 09/2011 • LOD2 stack • Released on • 09/2012 • 09/2013 • 09/2014 Requirements/ Prerequisites Applied on Use Case Government Data WP9: Media & Publishing

  20. WKD Legal & Regulatory • Companies/Brands • Carl Heymanns Verlag • Luchterhand • Werner Verlag • Carl Link • CW Haarfeld • Deutscher Wirtschaftsdienst • AnNoText • Trigon Data • Products (Examples) • IP, Administrative Law • Civil, Family, Labor Law • Construction Law • Publications for Schools/KiTas • Public Health Insurance • Magazin „Personalwirtschaft“ (HR Management) • SW for Lawyers and Notaries WKD is part of Wolters Kluwer B.V. • Customer orientation • Lawyers • Tax Accountants • Corporations and SMEs • Fincancial institutions • Health Providers • Public Sector • Worldwide reach • Europe • North America • Asia/Pacific • Economic success • Revenue 2009 EUR 3,4 bln. • 18.000 Employees • Listed Amsterdam SE WKD Tax & Accounting • Companies/Brands • Akademische Arbeits-gemeinschaft Verlag • Addison Group • Schleupen Tax • Wago Curadata • Products (Examples) • Tax SW for Consumers • SW for Tax Accountants • SW for SMEs with focus Controlling and Accounting WP7: LOD2 for Publishers Wolters Kluwer Deutschland (WKD):“Semantic Technologies and Standards are an enabler for the media and publishing industry to create added-value for their customers with reasonable costs.“

  21. Content Acquisition Editing Composing Bundling Publishing Interfacing Sales Customer Service Customer Content Supply Chain of Wolters Kluwer Deutschland (WKD) Content Acquisition Content Enrichment Enterprise Applications • Acquisition of LOD governmental data • Laws & Regulations • Court cases • Administrative Rulings • Statistical information • Based on: • Adequate delivery format • Adequate metadata • Adequate Licensing and IPR • Enrichment of WKD data • Enrichment with additional metadata from the LOD cloud • Automatic Interlinking within WKD data, but also into the LOD cloud • Based on: • Adequate delivery format • Adequate metadata • Adequate functionality • Adequate Licensing and IPR • Data integration in Enterprise and other Costumer Applications • Integration of customer and WKD data with data from the LOD cloud • Development of new services, e.g. around metadata economics • Based on: • Adequate functionality • Adequate APIs • Adequate Licensing and IPR WP7: WKD as a Consumer of LOD Data

  22. Content Acquisition Editing Composing Bundling Publishing Interfacing Sales Customer Service Customer Content Supply Chain of Wolters Kluwer Deutschland (WKD) Cloud - Publishing Marketing measures • Development of WKpedia • Publishing of enriched governmental information • Publishing of legal domain thesauri • Motivating contextualisation in LOD cloud • Based on: • Adequate functionality • Adequate APIs • Adequate Licensing and IPR • Integration in overall marketing strategy of WKD • Dissemination of LOD2 in media and publishing sector • Launching surveys • Permanent information of customers • Sponsoring of conferences • Based on: • Clear scope of LOD2 project to support future publishing paradigms WP7: WKD as a Publisher of LOD Data

  23. WP8: Towards Linked Enterprise Data Webs (1) • Linked Enterprise Intra Data Webs can fill the gap between Intra-/Extranets and ERP systems • Facilitates data integration along value-chains within and across enterprises • The pragmatic, incremental, vocabulary based Linked Data approach reduces data integration costs significantly • Objectives: • Promote openness and standardsin enterprise data workflowsand applications

  24. WP8: Linked Enterprise Data Use Case Scenario (2) • Wage policy EBI: • Build an application for surveying wage policy in a company, domain, sector, region, etc. • Scenarios: • A company wants to know if its wage policy is consistent with the market (in similar and related companies and sectors). • A job applicant would like to have an idea about his wage expectations according to his expertise, profile and education background • A governmental agency would like to survey the salaries in a particular region according to an economic branch and other parameters

  25. WP8: Linked Enterprise Data (3) • Targeted service: • A Saas service with different levels of subscription • The service is a mashup of payroll and HR data of enterprises subscribing to the service to build an index store of data facts about wages. • Different consolidation parameters and key performance indicators (KPI) will be studied to provide relevant reports and visualisation interfaces. • Integration of external datasets in a particular survey: public datasets in the web cloud or private datasets of participating companies. • Privacy issues management: make private and nominative data anonymous.

  26. WP8: Linked Enterprise Data (4) • Preliminary overview: Search and EBI interface Employees and HR database Full text index Indexer Data crawler Data LODification and anonymisation Data consolidation Data enrichment, annotation Data Cleaning and uniformisation Taxonomies RDF store SPARQL endpoint Payroll software

  27. WP9: Open Government Data Use Case publicdata.eu - find and reuse datasets from local, regional and national public bodies across Europe from a single place

  28. WP9: Who is this for? • Data literate citizenry • Data journalists • Policy experts • Decision makers • Mobile and web developers • Academics / researchers • Public bodies • Companies • Civic society / NGOs • And so on...

  29. WP9: What will it involve? • Enablingexchangeofmetadatabetween different datacatalogues • Aggregatingdatasetsfromexistingdatacatalogues • Creating a European communityofreuserstoimprovemetadata • Creatingmechanismsforcapturingderived / relateddatasets • Bridge languageandtopicalgapstoassociaterelatedinformationfrom all Member States

  30. WP10: Training, Dissemination, Community Building & Fertilization • The general aim of this work package is to establish a worldwide focal point for academic and industry parties interested in contributing to or taking advantage of the novel Linked Data methodologies and components, which will emerge in the project. • In particular, our activities will be targeted at: • informing the community of the state-of-the-art developments taking place in the field, • disseminating the project results in order to foster community building and to create an impact on industry and research in Europe and worldwide, • providing training to interested audiences in the technologies developed throughout the project

  31. WP10: Tasks & Timeline • Task 10.1 Training • (M 14 – M37) • Internal face-to-facetraining • Externaltraining • PhDprogramme • Task 10.2 Dissemination, Community Building & Cross-Fertilization • (M1 – 48) • Scientific dissemination • Industrial dissemination • Online marketingactivitiesacross all identifiedtargetgroups

  32. WP10: LOD2 Dissemination Resources Website: http://lod2.eu Weblog: http://lod2.eu/BlogPost Twitter: http://twitter.com/lod2project SlideShare: http://www.slideshare.net/lod2project PUBLINK: http://lod2.eu/Article/Publink.html Remark: please use #lod2 on twitter for your posts & connect with account: lod2project many thanks in advance!!

  33. PubLink – LOD2’s Linked Open Data Starter Service • PubLinkhelpsselectedorganizationswith a focusedconsultingeffortof 10-15 daystopublishandmakeuse out ofLinked Data • PubLinkhelpstoevaluatethe LOD2 technologiesandtoincreasethewealthofLinked Data • Yearlyapplicationdeadline in Winter • 2011 PubLinkparticipantsinclude: • Umweltbundesamt GmbH, Austria • Greater London Authority • Deutsch Bibliographie, Historische Kommission • The ParliamentofFinland • City of Vienna • InstitutoCanario de Estadística (ISTAC) • See: http://lod2.eu/Article/Publink.html

  34. WP11: Exploitation and Standardization (1) Objectives: • Realizing the vision of the LOD2 project and use case studies • Standardisation of LOD2 architecture • Exploitation of knowledge and technical results Exploitation: • Use case studies and the industrial and end-user community partners will drive the exploitation. • Tracking important technical and commercial in information retrieval, data management including news and media. • Publish exploitation plan identifying opportunities, benefits and impact of LOD2 consortium.

  35. WP11: Exploitation and Standardization (2) • Interlectual Property Rights (IPR): • Core component of the LOD2 stack will be published under open-source license. • Domain adoptions of LOD2 stack considered on case-by-case basis to protect IPR. • Strategy ensures that all components of LOD2 are royalty-free. • Standardization: • Actively participating in appropriate standards bodies. • Establishing a W3C Linked data interest group. • Orchestration with other projects: • Encourage take-up of LOD2 technologies by other projects. • Foster input from other EU projects relevant for the development of LOD2

  36. 10 Partners from 7 European Countries WP12: Fact Sheet • Project • Instrument: Large-scaleIntegrating Project • Objective: Intelligent Information Management • Call: FP7-ICT-2009-5 • Duration: 09/2010 – 08/2014 • Means • Total Budget: 8,58 M€ • Total Funding: 6,45 M€ • Total Resources: 844 PM • Consortium • Universität Leipzig (Coordinator) • Centrum Wiskunde & Informatica • National University ofIreland in Galway • Freie Universität Berlin • OpenLink Software • Semantic Web Company • TenForce • Exalead • Wolters Kluwer Deutschland • Open KnowledgeFoundation

  37. Dr Sören Auer • Scientific Project Leader • Phone: +49 (341) 97-32367 • Fax: +49 (341) 97-32329 • Email: auer@uni-leipzig.de • http://www.informatik.uni-leipzig.de/~auer • Nadine Jänicke • Project Manager • Phone: +49 (341) 97-32310 • Fax: +49 (341) 97-32329 • Email: jaenicke@uni-leipzig.de • http://bis.informatik.uni-leipzig.de/NadineJaenicke

More Related