1 / 26

Improving communication in e-democracy using NLP and semantic tools

Improving communication in e-democracy using NLP and semantic tools. Michele Carenini. Summary. Where does Natural Language Belong? Natural Language (Processing) in very few words Why Putting Semantics into the Web and… … how to do it EDEN: the Gap between Us and Them

Télécharger la présentation

Improving communication in e-democracy using NLP and semantic tools

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Improving communication in e-democracy using NLP and semantic tools Michele Carenini ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07

  2. Summary • Where does Natural Language Belong? • Natural Language (Processing) in very few words • Why Putting Semantics into the Web and… • … how to do it • EDEN: the Gap between Us and Them • Good and Bad Lessons • What now? ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07

  3. Natural Language Artificial Language Where Does Natural Language Belong? vs. ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07

  4. pragmatics syntax Evolution in NLP semantics morphology Complexity NL In NLP NL is the theoretical set of well-formed phrases/sentences of human languages • NLP deals with the possibility of making computers process NL; • By definition, computers can process only computable objects; • There is at least two main features of NL that are (or theoretically can be) computable: morphology and syntax; • Well-formedness is a pre-requisite on which (morphology and) syntax may be computed. ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07

  5. ((α → β) → (¬β → ¬ α)) John eats the cake vs. vs. *((α → β) → (ββ))α)) *John are eaten one cakes Well Formedness ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07

  6. Putting Some Semantics Into The Web The Web: a system of interlinked, hypertext documents accessed via the Internet. With a Web browser, a user views Web pages that may contain text, images, and other multimedia and navigates between them using hyperlinks. ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07

  7. Putting Some Semantics Into The Web WHY: ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07

  8. 42,600,000?!? TRUCK CAR Intelligent search DRIVING MOVING Putting Some Semantics Into The Web WHY: HOW: AND/OR ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07

  9. Putting Some Semantics Into The Web Web 2.0 • The transition of web sites from isolated information silos to sources of content and functionality • A social phenomenon embracing an approach to generating and distributing Web content itself, characterized by open communication, decentralization of authority, freedom to share and re-use • Enhanced organization and categorization of content, emphasizing deep linking ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07

  10. Putting Some Semantics Into The Web Semantic Web • Some elements of the semantic web are expressed in formal specifications, including: • Resource Description Framework (RDF) • Data interchange formats (e.g. RDF/XML, N3, Turtle, N-Triples) • Notations such as RDF Schema (RDFS) • The Web Ontology Language (OWL) all of which are intended to formally describe concepts, terms, and relationships within a given knowledge domain. ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07

  11. Putting Some Semantics Into The Web • Ubiquitous Connectivity, broadband adoption, mobile Internet access and mobile devices • Network computing, software-as-a-service business models, Web services interoperability, distributed computing, grid computing and cloud computing • Open technologies, Open APIs and protocols, open data formats, open-source software platforms and open data (e.g. Creative Commons, Open Data License) • Open identity, OpenID, open reputation, roaming portable identity and personal data • The intelligent web, Semantic web technologies such as RDF, OWL, SWRL, SPARQL, Semantic application platforms, and statement-based datastores • Distributed databases, the "World Wide Database" • Intelligent applications, natural language processing, machine learning, machine reasoning, autonomous agents Web 3.0 ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07

  12. Putting Some Semantics Into The Web • Ubiquitous Connectivity, broadband adoption, mobile Internet access and mobile devices • Network computing, software-as-a-service business models, Web services interoperability, distributed computing, grid computing and cloud computing • Open technologies, Open APIs and protocols, open data formats, open-source software platforms and open data (e.g. Creative Commons, Open Data License) • Open identity, OpenID, open reputation, roaming portable identity and personal data • The intelligent web, Semantic web technologies such as RDF, OWL, SWRL, SPARQL, Semantic application platforms, and statement-based datastores • Distributed databases, the "World Wide Database" • Intelligent applications, natural language processing, machine learning, machine reasoning, autonomous agents Web 3.0 UNDER CONSTRUCTION ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07

  13. EDEN: Where It All Began (at least some of it) ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07

  14. Us: the technicians Them: the PA’s End-Users: the Citizens EDEN: The Gap Between Us And Them ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07

  15. General Objective Of The NLP Tools (in the eDemocracy framework) Interacting to (CHI) or through (CMI) an artificial system... ... in order to get information that makes the participation to decision-making process more effective. ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07

  16. Technicians: difficult to deal with a less than pefect NL definition. Lost on NLP Planet Users: difficult to deal with the very notion of Natural Language. Lost on Bad-Language World One Overall Problem Main problem: the mutual understanding of different fields of interest and expertise. ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07

  17. J Good Lessons… ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07

  18. L … And Bad Ones ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07

  19. JGood Lessons (1) 1. Linguistic Resource Re-use: the main purpose of the grammar(s) developed within EDEN is information extraction, not (full) linguistic analysis. Then major effort was devoted to cover most “information-bearing” constituents, as (complex) Noun Phrases and main Verb-Noun and Verb-Adjective relations. -> Easy replication to different (Western) languages: • the four linguistic analysers made available to the project (Dutch, English, German and Italian) have been deployed with the same development tool (Yap4NL); • consequently, they all share the same approach to linguistic analysis (rule based, full-path parsing with post-parsing procedure, which simulates a shallow parser); • finally, no major change, or significant integration workouts were necessary, for the localisations of modules, from the point of view of software design. ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07

  20. JGood Lessons (2) 2. Fast Prototyping: development of the Dutch Grammar was carried out completely from scratch in less than one person/year. Fast prototyping was mainly allowed by: • the availability of an advanced dedicated tool for grammar development; • the simplicity in the approach to linguistic processing. Interesting outcome: ouput format in terms of flat (no structure, no hierarchy, no explicit internal link) lists of “triples”. ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07

  21. JGood Lessons (3) 3. Grammar Re-usability: in the grammar format used in EDEN, each grammar rule has a syntagmatic part, which corresponds to the reduction rule, and a set of “actions” which independently build the feature structure of each syntactic phrasal constituent. This took to two interesting aspects: • the same linguistic analyser has been embedded in several different modules; and • an interesting experiment of grammar re-use (from Dutch to German) has been carried out, with encouraging results. ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07

  22. LBad Lessons (1) 1. NLP Exploitation in a Specific Domain: First problem concerned the very notion of Natural Language: • Traditional NLP definition: “Natural Language is the theoretical set of all well-formed sentences used by humans to communicate”. For instance, John eats the cake is a sentence belonging to NL, while *John are eaten one cakes is not. • -> Reason: there must be a “minimum threshold” that must be respected in order to have an artificial system properly behaving (i.e., assigning a structure). • First EDEN definition by users: “Natural Language is whatever string expressed by citizens, possibly including mis-spellings, non-existing words, bad syntactic structures”. Therefore, any juxtaposition of strings, once it has been typed in by a citizen, belongs to NL. • -> Reason: in communication (and especially in e-mail communication) a lot of mistakes occur; the system must be able to deal also with them. ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07

  23. LBad Lessons (2) 2. Final User’s Expectations: EDEN modules are (of course) aimed at manipulating symbols in order to make some information accessible. Instead, citizens sometimes expected the system to “understand” what they typed in. • They expected the system to be able to understand trans-phrasal phenomena (as personal pronouns solution – “I need a garage for my car; where can I find one?”); • they even expect the system to manage possible pragmatic phenomena (like plan inference, over-answering, etc. – “What time is the train leaving to Rome?”). ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07

  24. the system What We Learned (1) • Dealing with DNLP (“Dirty NLP”): must be well accepted by ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07

  25. “Bringing technology to the people” “Bringing technology to the people without letting them know” What We Learned (2) • Hiding technology: must become ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07

  26. What Now? The Future • Standard data interchange formats • Interoperability • Grid Computing • Distributed systems • Standard Notation Schemes • Standard Ontologies Accessible from Different Perspective • Adaptive Filtering • Advanced Multimodal Interfaces • Remotely Accessible Applications • Privacy and Security Standards and Tools • Real AI (Knowledge Representation, Decision Support Systems, Machine Learning, Autonomous Agents, NLP) ICT FOR SAFE DIGITAL CITIES – Inclusive e-services – Bologna 29.06.07

More Related