1 / 33

An Ontology-Based Knowledge Portal for Language Technology

An Ontology-Based Knowledge Portal for Language Technology. Hans Uszkoreit, Brigitte J örg, Gregor Erbach. Project COLLATE.

fiona
Télécharger la présentation

An Ontology-Based Knowledge Portal for Language Technology

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Ontology-Based Knowledge Portalfor Language Technology Hans Uszkoreit, Brigitte Jörg, Gregor Erbach

  2. Project COLLATE Theme:Computational Linguistics and Language Technology for Real World ApplicationsPartners: DFKI Saarbrücken, Saarland UniversitySupport:A Grant by the German Federal Ministry for Education and Research for RTD strengthening the position of Saarbrückenas a Competence Center for Language TechnologyPIs: Hans Uszkoreit, Manfred Pinkal and Wolfgang WahlsterDuration: Spring 2001 - end of 2003

  3. Information Center: LT World • Information Service about Language Technology • www.lt-world.org • Ontology-based • XML Import and Export Formats • Visual and Structural Design

  4. Objectives • distributed information service • combines and offers for each aspect of LT the best contents available • exploits hypermedia technology for including useful contents • is flexible and scalable enough to support the evolution of the discipline • exhibits a structure that is transparent for both experts and visitors from outside the field • increasingly utilizes language and knowledge technologies for improved management and presentation of the information. • is open for exchange of data with other information services • potential for interoperability with future knowledge services • is suited for the sophisticated metadata schemes of the envisaged semantic web

  5. LT World - Levels and Tasks Conceptual Level Specification Level Technical RealizationLevel Content Level underlying logicalstructure ontology specifications concrete architecture selection of sources organization of collection/production data maintenancestructure XML specifications DBs, XML pages,HTML pages content in DBs,documents, links presentationalstructure generic designCI actual designof pages presented contents

  6. User View: Four Top Level Areas • Information and Knowledge • Players and Teams • Resources and Results • Communication/Interaction

  7. Information and Knowledge • Basic knowledge about all areas of LT source: Survey of the State of the Art in Human Language Technology (1997, new edition in preparation) • Pointers to specialized knowledge (links to literature, projects, systems, products, people, resources, standards...) source: link collection by DFKI • Glossary of the fieldsource: DFKI with input from HLT Survey

  8. Players and Teams • DB with all researchers in LTnames, affiliations, links to homepagesnumber of entries: 2235 • DB of projectsnumber of entries: 659 • DB of research organisations, companies, funding agenciesnumber of entries: 1561

  9. Resources and Results • DB of prototypes, research systems and productssource: ACL Software Registry (operated by DFKI) • Links to resource initiatives: ELRA, LDC, • For resources link to search service of OLAC

  10. Communication/Interaction • News about technologies, people, products, centers, etc.source: collection by DFKI and contributions by usersnumber of entries: 370 • List of Events: Conferences, Workshops, Summer Schools,etc.source: collection by DFKI and contributions by usersnumber of entries: 251 • Links Topic-Centered Mailing Listssource: collection of existing lists

  11. Usage of LT World

  12. Systematics of the Discipline • Mature scientific or engineering disciplines have developed a systematics of the subject • Younger disciplines have outgrown their first systematics • LT or CL does not yet have a systematics or a classification scheme

  13. Logical Structuring: Two Options Tree-Structured Classification • Libraries • Encyclopedias and Handbooks Multidimensional Structuring • Multiple-Inheritance Hierarchies • And-Or Hierarchies

  14. Means for Ordering • Terminology • Thesaurus • Classification vs. Systematics • Taxonomy = Classification + Nomenclature • Ontology • formal ontology • relational ontology

  15. Our Setup • Immediately visible structure: easy and transparent • Some multidimensional structuring through chapter structure of the Survey • For internal storage and DB search: complex multidimensional structure • Underlying systematics: multilayered and multidimensional ontology

  16. Ontologies • Theoretical Ontologies • Epistemological reasons • Phenomenological systematics • Practical Ontologies • Support of processes • Data Maintenance • Information Services

  17. Systematics/Ontologies • Generic Core: Dublin Core • Special Ontologies underlying exchange formats for special information types such as • OLAC (for linguistic resources) • BibTex (for scientific literature) • Languages (for language codes) • Generic ontologies for the scientific discipline and technology sector • General Multidimensional Classification for CL and LT

  18. Applied Science Actor Subject NewKnowledge Means Applications Applied Research Actors Subject ResearchGoals Methods Applications Applied ResearchProject Actors Subject ResearchGoals Methods Duration Applications • Science • Actor • Subject • NewKnowledge • (Scientific)Means • Research • Actors • Subject • ResearchGoals • Means • ResearchProject • Actors • Subject • ResearchGoals • Means • Duration

  19. Funded Research Project • Name • Acronym • Full Name • Actors • Organizations • PI • Other Roles • Researchers • Subject • Discipline/Area • Objectives • Goals • Means • Program • Duration • StartDate • EndDate • Funding • Agency • Program • Funding Number

  20. Education Science Search ExtraScientific Purpose Production ExtraScientific Purpose Scientific Education Research Technical Product Applied Research Technology

  21. Multidimensional Classification for CL and LT Dimensions Generic: Type of Resource (web page, metaindex, publication, person, product, patent, project, ...) People Geolocation Date/Comments Disciplin--Specific (not all may apply for a given resource) Application (grammar checking, text translation, IR) Linguality (monolingual, bilingual,multilingual, translingual, language-inde) Languages/Language Pairs (Romanian, Thai, <en-fr>,...) Technologies (HMM, FSA, EBT, linear programming, ...) Linguistic Area (morphology, syntax, pragmatics,...) Linguistic Approach (Two-Level Morpology, systemic functional g., DRT)

  22. Excerpt from the Ontology Technology Dublin Core Language Technology Languages OLAC BibTex LT World Communication& Events Teams & Players Systems & Resources Information & Knowledge Publications

  23. Area Nodes Example of the shallow hierarchy for technologies • Text Technologies ... • Text Summarization... • Information Extraction • Named Entity Recognition • Terminology Extraction • Relation Extraction • Answer Extraction... • Text Generation...

  24. Main Info for Each Subject Area • Name • Acronyms • aka‘s, Term Translations • Short Definition • Explanation • Topic Websites • R&D Prototypes/Products • Projects • People • Literature

  25. Ontology Modelling and Interchange Formats • Ontologies maintained with Protégé 2000 • Ontology Modelling with Protégé • Export / Interchange Formats

  26. Protégé: Class View

  27. Protégé: Slot View Protégé: Slot View

  28. Protégé: Form View (Input-Configuration) Protégé: Form View

  29. Protégé: Instance View (Input-Interface) Protégé: Instance View

  30. Protégé: RDF-Export Instance of the Babel system Protégé: RDF-Export <LT:System rdf:about="&LT;LT_00398" LT:applications="Structure Building" LT:dc.coverage="66123 Saarbruecken" LT:dc.identifier="http://www.dfki.de/~stefan/Babel/e_babel.html" LT:lt.linguality="monolingual" LT:lt.linguistic_approach="HPSG" LT:lt.linguistic_area="syntax" LT:olac.type.functionality="Written Language" LT:olac.type.linguistic="HPSG" LT:resource.contact="Stefan.Mueller@dfki.de" LT:resource.homepage_url="http://www.dfki.de/~stefan/Babel/e_babel.html" LT:resource.name="Babel" LT:resource.type="system" LT:technological_method="Written Language" LT:type="system" rdfs:label="Babel"> <LT:resource.description>Babel is a Prolog System with Web-Interface in Perl and Java. Its main purpose is the test of an HPSG grammar for German.</LT:resource.description> <LT:dc.language rdf:resource="&LT;English"/> <LT:lt.languages rdf:resource="&LT;German"/> <LT:dc.creator rdf:resource="&LT;LT_00399"/> <LT:developed-by rdf:resource="&LT;LT_00399"/> <LT:dc.rights rdf:resource="&LT;ont_051002_00178"/> <LT:developed-by rdf:resource="&LT;ont_051002_00209"/> <LT:olac.format.os>Windows 95</LT:olac.format.os> <LT:olac.format.os>Windows NT</LT:olac.format.os> </LT:System> Attributes Relations

  31. Protégé: RDF-Export Instance of the Babel system Protégé: RDF-Export <LT:System rdf:about="&LT;LT_00398" LT:applications="Structure Building" LT:dc.coverage="66123 Saarbruecken" LT:dc.identifier="http://www.dfki.de/~stefan/Babel/e_babel.html" LT:lt.linguality="monolingual" LT:lt.linguistic_approach="HPSG" LT:lt.linguistic_area="syntax" LT:olac.type.functionality="Written Language" LT:olac.type.linguistic="HPSG" LT:resource.contact="Stefan.Mueller@dfki.de" LT:resource.homepage_url="http://www.dfki.de/~stefan/Babel/e_babel.html" LT:resource.name="Babel" LT:resource.type="system" LT:technological_method="Written Language" LT:type="system" rdfs:label="Babel"> <LT:resource.description>Babel is a Prolog System with Web-Interface in Perl and Java. Its main purpose is the test of an HPSG grammar for German.</LT:resource.description> <LT:dc.language rdf:resource="&LT;English"/> <LT:lt.languages rdf:resource="&LT;German"/> <LT:dc.creator rdf:resource="&LT;LT_00399"/> <LT:developed-by rdf:resource="&LT;LT_00399"/> <LT:dc.rights rdf:resource="&LT;ont_051002_00178"/> <LT:developed-by rdf:resource="&LT;ont_051002_00209"/> <LT:olac.format.os>Windows 95</LT:olac.format.os> <LT:olac.format.os>Windows NT</LT:olac.format.os> </LT:System> Relations

  32. Organizational Issues Division of Labour • In the beginning all contents and references were collected and maintained by DFKI • Input of the authors/ area specialists of the Survey for distributed authoring and content maintenance • Input from the LT community via HTML forms and XML import format • News and conferences maintained and updated by DFKI

  33. Relationships to External Resources • Included but autonomous resources: ACL NL Software Registry, Language Technology Survey • Systematically cross-Linked and Cross-Searchable Resources: all OLAC Resources such as (LDC, SIL, ACL SR, and OLAC Home) • Systematically crosslinked resources: HLT Central, ELSNET, EACL ACL NLP Universe • Linked resources: All other relevant resources relevant for LT

More Related