600 likes | 748 Vues
ESSLLI 2006 Summer school Malaga, Spain 31 July – 11 August. PLUS Courses on time Proceedings of all courses Workshops Student sessions Internet connection. MINUS Not well organized Site not updated on time Lunch tickets. General Comments. Courses.
E N D
ESSLLI 2006 Summer school Malaga, Spain 31 July – 11 August
PLUS Courses on time Proceedings of all courses Workshops Student sessions Internet connection MINUS Not well organized Site not updated on time Lunch tickets General Comments
Courses • Counting Words: An Introduction to Lexical Statistics • Formal Ontology for Communicating Agents (Workshop) • Word Sense Disambiguation • Introduction to Corpus Resources, Annotation & Access • An Empirical View on Semantic Roles Within and Across Languages • Approximate Reasoning for the Semantic Web
Counting WordsMarco Baroni and Stefan Evert • Contents • Introduction • Distributions • Zipf’s Law • The ZipfR package • Practical Consequences and Conclusion
Introduction • The frequency of words plays an important role in corpus linguistics. • The study of word frequency distributions is called Lexical Statistics. • It seems that word frequency distributions are more of interest to theoretical physicists than to theoretical linguists. • This course introduces some of the empirical phenomena pertaining to word frequency distributions and the classic models that have been proposed to capture them.
DistributionsBasic Terminology • Types: distinct words • Tokens: instances of all distinct words • Corpus size (N): number of tokens in the corpus • Vocabulary size (V): number of types • Frequency list: list that reports the number of tokens of each type in the corpus • Rank/Frequency profile: replace the types with the frequency ranks • Frequency Spectrum: a list reporting how many types in a frequency list have a certain frequency
DistributionsExample • Sample: a b b c a a b a d • N=9, V=4 • Freq. list rank/freq. prof. Freq. spect.
DistributionsTypical frequency patterns • Top ranks are occupied by function words (the, of, and..) • Frequency decreases quite rapidly • The lowest frequency elements are content words
Zipf’s Law • The frequency is a non-linear decreasing function of rank. • Zipf’s model: f(w)=C/r(w)a • The model predicts a very rapid decrease in frequency among the most frequent words, which becomes slower as the rank grows. • Mathematical property: • logf(w)=logC-alogr(w) (Linear function)
Zipf’s LawApplications and explanations • Zipfian distributions are encountered in various phenomena: • City populations • Incomes in economics • Frequency of citations of scientific papers • Visits to web sites • “Least effort principle”
ZipfR Package • Statistical package for modeling lexical distributions. • url: http://www.purl.org/stefan.evert/zipfR • Dependencies: the R package • url: http://www.r-project.org • Binaries available for Win and MacOS. • Source available for Linux. • Open source, GNU Licensed project.
Practical Consequences and Conclusion • The Zipfian nature of word frequency distribution causes data sparseness problems. • Although V is growing with corpus size, we cannot use it as a measure of lexical richness when comparing corpora. • Interested readers should proceed to Baayen(2001) for a thorough introduction to word frequency distributions with an emphasis to statistical modeling.
References • Abney, Steven (1996), Statistical methods and linguistics. In Klavans, J. & Resnik, P. (eds) The balancing act: Combining symbolic and statistical approaches to language. Cambridge MA: MIT Press, 1-23. • Baayen, Harald (2001), Word frequency distributions. Dordrecht: Kluwer • Baldi, Pierre/Frasconi, Paolo/Smyth, Padhraic (2003), Modeling the internet and the web. Chichester: Wiley • Biber, Douglas/Conrad, Susan/Reppen, Randi (1998), Corpus linguistics. Cambridge: Cambridge University Press • Creutz, Mathias (2003), Unsupervised segmentation o words using prior distributions of morph length and frequency. In Proceedings of ACL 03, 280-287 • Delgaard, Peter (2002), Introductory statistics with R. New York: Springer • Evert, Stefan (2004), The statistics of word co-occurrences: Word pairs and collocations.PhD thesis, University of Stuttgard/IMS
References • Evert, Stefan/Baroni, Marco (2006), Testing the extrapolation quality of word frequency models. In Proceedings of Corpus Linguistics 2005, available from http://www.corpus.bham.ac.uk./PCLC • Li, Wentian (2002), Zipf’s Law everywhere. In Glottometrics 5, 14-21 • Manning, Christopher/Schutze, Hinrich (1999), Foundations of statistical natural language processing. Cambridge MA: MIT Press • McEnery, Tony and Andrew Wilson (2001), Corpus Linguistics, 2nd edition. Edinburgh: Edinburgh University Press • Oakes, Michael (1998), Statistics for corpus linguistics. Edinburgh: Edinburgh University Press • Sampson Geoffrey (2002), Review of Harald Baayen: Word frequency distributions. In: Computational Linguistics 28, 565-569 • Zipf, George Kingsley (1949), Human behavior and the principle of least effort. Cambridge MA: Addison-Wesley • Zipf, George Kingsley (1965), The psycho-biology of language. Cambridge MA: MIT Press
Formal Ontology for Communicating Agents (FOCA)Workshop • Contents • Introduction • Communicative acts • The missing ontological link • Semantic Coordination • A Communication Acts Ontology for Software Agents Interoperability • OWL DL as a FIPA ACL content Language
Introduction • Purpose of the workshop: • To gather contributions that: • Take seriously into account the ontological aspects of communication and interaction • Use formal ontologies for achieving a better semantic coordination between interacting and communicating agents
IntroductionCommunicative acts • According to Austin, 3 kinds of acts can be performed simultaneously through a single utterance: • Locutionary act: producing noises that conform to a system • Illocutionary act: what is performed in saying something • Perlocutionary act: what is performed by saying something • An important issue is the distinction between the last two acts.
IntroductionThe missing ontological link • Ontological ingredients: • Events, states, actions, speech acts, relations, plans, propositions, arguments, facts, commitments,.. • Top-level ontologies focus on the sub-domain of concrete entities, like time, space,.. • There is a need for the integration of the large amount of the philosophical work on other domains like that of abstract entities.
IntroductionSemantic Coordination • An important aspect of interaction and communication involves the management of ontologies. • Scenaria identified w.r.t. semantic coordination: • With a shared pre-existing ontology • With different ontologies but linked to a pre-existing common upper level ontology • With different ontologies but mapped directly onto each other • When agents are involved: • Keep static ontologies but manage a shared dynamic one • Create new static ontologies through a negotiation phase • Modify their ontology during the interaction while maintaining some kind of negotiation meaning
A Communication Acts Ontology for Software Agents Interoperability • Different classes of communication acts to each ACL. • The use of an agreed ontology can open a possibility of real agents interoperation based on a wide agreement on some classes of communication acts that will serve as a bridge among different ACL “islands” • Main design criterion: follow the speech act theory and also embed an approach for expressing the semantics of the communication acts • Use the OWL DL language
A Communication Acts Ontology for Software Agents Interoperability • Upper layer • CommunicationAct ⊑ ∀hasSender.Actor ⊓ =1.hasSender ⊓ ∀hasReceiver.Actor ⊓ ∀hasContent.Content • Assertive ≣ CommunicationAct ⊓ ∃hasContent.Proposition ⊓ ∃hasCommit.AssertiveCommitment • Directive ≣ CommunicationAct ⊓ ∃hasContent.Action ∃hasCommit.DirectiveCommitment • Commisive ≣ CommunicationAct ⊓ ∃hasContent.Action ∀hasCondition.Proposition ⊓ ∃hasCommit.CommissiveCommitment • Expressive ≣ CommunicationAct ⊓ ∃hasContent.Proposition ⊓ ∃hasState.PsyState ∃hasCommit.ExpressiveCommitment • Declarative ⊑ CommunicationAct ⊓ ∃hasContent.Proposition
A Communication Acts Ontology for Software Agents Interoperability • The Standards Layer extends the Upper Layer with terms representing classes of communication acts of general purpose ACLs, like FIPA-ACL. • The Applications Layer is the most specific. Defines communication acts classes for a specific application. • Concluding: Classes in the upper layer are considered the framework agreement for general communication. Classes in the standard layer reflect classes of communication acts that different standard ACLs define. Classes in the application layer concern the particular communication acts used by each agent system committing to the ontology.
References • J. L. Austin. How to Do Things With Words. Oxford University Press. Oxford, 1962 • J. R. Searle. Speech Acts: An Essay in the Philosophy of Language. Cambridge University Press. New York, 1969 • M. P. Singh. Agent Communication Languages: Rethinking the Principles. IEEE Computer, vol.31, num.12, pp.40-47, 1998 • M. Wooldridge. Semantic Issues in the Verification of Agent Communication Languages. Journal of Autonomous Agents and Multi-Agent Systems, vol.3, num.1, pp.9-31, 2000 • Y. Labrou, T. Finin, Y. Pen. Agent Communication Languages: the Current Landscape. IEEE Intelligent Systems, vol.14, num.2, pp.45-52, 1999 • M. P. Singh. A Social Semantics for Agent Communication Languages. Issues in Agent Communication, pp.31-45. Spinger-Verlag, 2000 • FIPA Communicative Act Library Specification. Foundation For Intelligent Physical Agents, 2005. http://www.fipa.org/specs/fipa00037/SC00037J.html
References • N. Asher and A. Lascarides. Logics of Conversation. Cambridge University Press, 2003 • S. Levinson. Pragmatics. Cambridge University Press, 1983 • J.R. Searle and D. Vanderveken. Foundations of illocutionary logic. Cambridge University Press, 1975 • J.R. Searle. The Construction of Social Reality. Free Press, New York, 1995 • R. Stalnaker. Assertion. Syntax and Semantics, 9:315-332, 1978 • J. Ginzburg. Dynamics and the Semantics of Dialogue. CSLI: Stanford, 1996 • H. H. Clark. Using Language. Cambridge University Press, 1996 • S. Carberry. Plan Recognition in Natural Language Dialogue. MIT Press, 1990
OWL DL as a FIPA ACL Content Language • FIPA-SL content language is in general undecidable. • Use OWL DL in order to enable semantic validation in the content of the ACL message and to separate speech act semantics from content semantics. • Their ontology defines some of the FIPA specifications (message structure, ontology service, content language, communicative act lib)
OWL DL as a FIPA ACL Content Language • Advantages • Application ontologies are domain independent. They can be applied to a MAS in different domains. • Various application ontologies in OWL DL are available. This shows a great potential for reusing already formulated ontologies. • W3C suggests the use of OWL within agents.
References • Eric Miller et al. Web Ontology Language (OWL), 2004 • RACER Systems GmbH. The features of racerpro version 1.9, 2005 • Foundation for Intelligent Physical Agents. FIPA ACL Message Structure Specification, 2002 • Foundation for Intelligent Physical Agents. FIPA Ontology Service Specification, 2001 • Foundation for Intelligent Physical Agents. FIPA SL Content Language Specification, 2002 • Foundation for Intelligent Physical Agents. FIPA Communicative Act Library Specification, 2002 • Web Ontology Working Group. OWL Web Ontology Language: Use Cases and Requirements, 2004 • Giovani Caire. JADE Introduction AAMAS 2005, 2005
Introduction to Corpus Resources, Annotation & AccessSabine Schulte im Walde and Heike Zinsmeister • Contents • Basic definitions • Corpora • Annotation • Tokenization & Morpho-Syntactic Annotation
Introduction to Corpus Resources, Annotation & Access • Basic Definitions • Linguistics: Characterization and explanation of linguistic observations • Corpus: Any collection of more than one text • Annotation: The practice of adding interpretative, linguistic information to an electronic corpus of spoken and/or written language
Corpora • Corpora give only a partial description of a language • They are incomplete • (e.g. Brown corpus does not include vocabulary related to WWW and e-mail) • They are biased • They include ungrammatical sentences • (e.g. typos, copy-and-paste errors, conversion errors) • We have to sample a corpus according to some design criteria such that it is balanced and representative for a specific purpose
Levels POS tags Lemmata Senses Semantic roles Named Entities Topic Co reference Principles The raw corpus should be recoverable Annotation should be extricable from the corpus Easy access to documentation Annotation scheme How, where, by whom the annotation was applied Annotation
Tokenization and Morpho-Syntactic Annotation • Tokenization: divides the raw input character sequence of a text into sentences and the sentences into tokens • Problems: • Language dependent task • Sentence boundaries • Numbers • Abbreviations • Capitalization • Hyphenation • Multiword expressions • Clitics • So.. We need to apply disambiguation methods
Tokenization and Morpho-Syntactic Annotation • Part-Of-Speech Tagging (POS tagging): The task of labeling each word in a sequence of words with its appropriate part-of-speech. • Performs a limited syntactic disambiguation • Context helps to disambiguate tags • Tagset: A set of part-of-speech tags • Classical 8 classes: noun, verbs, article, participle, pronoun, preposition, adverb, conjunction
Tokenization and Morpho-Syntactic Annotation • Morphology: morphology is concerned with the inner structure of words and the formation of words from smaller units. • Root: the morphem of the word • Stemming: A process that strips off affixes and leaves the stem. • Lemmatization: A process that gives the lemma of a word. Includes disambiguation at the level of lexemes, depending on the part-of-speech. • Co reference: is the reference in one expression to the same referent in another expression • Anaphora: is co reference of one expression with its antecedent
References • Tony McEnery (2003). Corpus Linguistics. In The Oxford Handbook of Computational Linguistics, pp.448-463. Oxford University Press • Tony McEnery and Andrew Wilson (2001). Corpus Linguistics. 2nd edition. Edinburgh University Press, chapter 1 • Sue Atkins, Jeremy Clear and Nicholas Ostler (1992). Corpus Design Criteria. In Literary and Linguistic Computing, 7(1):1-16 • Nancy Ide (2004). Preparation and Analysis of Linguistic Corpora. In Schreibman, S., Siemens, R., Unsworth, J., eds. A Companion to Digital Humanities. Blackwell • Geoffrey Leech (1997). Introducing Corpus Annotation. In Richard Garside, Geoffrey Leech and Tony McEnery, eds. Corpus Annotation. Longmanm pp.1-18 • Geoffrey Leech (2005). Adding Linguistic Annotation. In Developing Linguistic Corpora: A Guide to good Practice, ed. M. Wynne. Oxford: Oxbow Books, pp. 17-29. Available online from http://ahds.ac.uk./linguistic-corpora/ • Gregory Grefenstette and Pasi Tapanainen (1994): “What is a word, what is a sentence? Problems of tokenization.” In Proceedings of the 3rd Conference on Computational Lexicography and Text Research.
References • Andrei Mikheev (2003): "Text segmentation". In: Ruslan Mitkov, editor, "The Oxford Handbook ofComputational Linguistics", pp. 376-394. Oxford University Press. • Helmut Schmid (2007?): "Tokenizing". In: Anke Lüdeling and Merja Kytö, editors, "Corpus Linguistics. • An International Handbook.” Mouton de Gruyter, Berlin. • Christopher D. Manning and Hinrich Schütze (1999): “Foundations of Statistical Natural Language Processing”, chapter 10. MIT Press. • Atro Voutilainen (2003): ”Part-of-speech tagging". In: Ruslan Mitkov, editor, "The Oxford Handbook ofComputational Linguistics", pp. 219-232. Oxford University Press. • John Carroll, Guido Minnen, and Ted Briscoe (1999): “Corpus annotation for parser evaluation”. InProceedings of LINC. Bergen. • Ruslan Mitkov, Richard Evans, Constantin Orasan, Catalina Barbu, Lisa Jones, and Violeta Sotirova(2000): “Coreference and anaphora: developing annotating tools, annotated resources and annotationstrategies”. In Proceedings of the Discourse, Anaphora and Reference Resolution Conference, pp. 49-58. • Eva Hajicová, Jarmila Panevová, and Petr Sgall (2000): "Coreference in annotating a large corpus". InProceedings of the 2nd International Conference on Language Resources and Evaluation, pp. 497-500.
Approximate Reasoning for the Semantic WebFrank van Harmelen, Pascal Hitzler and Holger Wache • Contents • Semantic Web – the Vision • Ontologies • XML • W3C Stack • Beyond RDF: OWL • Why Approximate Reasoning • Reduction of use-cases to reasoning methods
Semantic Web – the Vision • Semantic Web = Web of Data • Set of open, stable W3C standards • “Intelligent things we can’t do today • Search engines: concepts, not keywords • Personalization • Web Services: need semantic characterizations to find them, to combine them • Requirement: Machine Accessible Meaning
Ontologies • Ontologies ARE shared models of the world constructed to facilitate communication • Ontologies ARE NOT definitive descriptions of what exists in the world (this is philosophy) • What’s inside an ontology? • Classes • Instances • Values • Inheritance • Restrictions • Relations • Properties • We need a machine representation
What was XML again? <country name=“Greece”> <capital name=“Athens”> <areacode>210</areacode> </capital> </country> Why not use XML ?? No agreement on: Structure Is country a: Object? Class? Attribute? Relation? What does nesting mean? Vocabulary Is country the same as nation ? XML country name capital Greece name areacode Athens 210
W3C Stack • XML: • Surface syntax, no semantics • XML Schema: • Describes structure of XML documents • RDF: • Datamodel for “relations” between “things” • RDF Schema: • RDF Vocabulary Definition Language • OWL: • A more expressive Vocabulary Definition Language
Beyond RDF: OWL • OWL extends RDF Schema to a full-fledged ontology representation language. • Domain / range • Cardinality • Quantifiers • Enumeration • Equality • Boolean Algebra • Union, complement • OWL is simply a Description Logic SHOIN(D) with an RDF/XML syntax. • 3 Flavors: OWL Lite, OWL DL, OWL Full
Why Approximate Reasoning • Current inference is exact: • “yes” or “now” • This was OK, because until now ontologies were clean: • Hand-crafted, well-designed, carefully populated, well maintained,… • BUT, ontologies will be sloppy: • Made by machines • (e.g. almost subClassOf) • Mapping ontologies is almost always messy • (e.g. almost equal)
Reduction of use-cases to reasoning methods • Realization (“member of”) • Subsumption (“subclass-relation”) • Mapping (“similar to”) • Retrieval (“has member”) • Classification (“locate in hierarchy”) • GOAL: • Find approximation methods for the reasoning methods • Many reasoning methods can be reduced to satisfiability • GOAL: find approximation methods for satisfiability
References • [Cadoli and Schaerf, 1995] Marco Cadoli and Marco Schaerf. Approximateinference in default reasoning and circumscription. Fundamenta Informaticae,23:123–143, 1995. • [Cadoli et al., 1994] Marco Cadoli, Francesco M. Donini, and Marco Schaerf. Isintractability of non-monotonic reasoning a real drawback? In NationalConference on Artificial Intelligence, pages 946–951, 1994. • [Dalal, 1996a] M. Dalal. Semantics of an anytime family of reasoners. In W.Wahlster, editor, Proceedings of ECAI-96, pages 360–364, Budapest, Hungary,August 1996. John Wiley & Sons LTD. • [Motik, 2006] B. Motik. Reasoning in Description Logics using Resolution andDeductive Databases. PhD thesis, Universität Karlsruhe (2006) • [Schaerf and Cadoli, 1995] Marco Schaerf and Marco Cadoli. Tractablereasoning via approximation. Artificial Intelligence, 74:249–310, 1995. • [Zilberstein, 1993] S. Zilberstein. Operational rationality through compilation ofanytime algorithms. PhD thesis, Computer science division, university ofCalifornia at Berkley, 1993. • [Zilberstein, 1996] S. Zilberstein. Using anytime algorithms in intelligent systems.Artificial Intelligence Magazine, fall:73–83, 1996.
Word Sense DisambiguationRada Mihalcea • Outline: • Some Definitions • Basic Approaches – Intro • Basic Approaches – In more Detail • Some Examples
Word Sense Disambiguation • Word Sense Disambiguation is the problem of selecting a sense for a word from a set of predefined possibilities (Sense Inventory). • Sense Inventory usually comes from a dictionary • Word Sense Discrimination is the problem of dividing the usages of a word into different meanings, without regard to existing predefined possibilities.
Word Sense Disambiguation Knowledge-Based Disambiguation - Machine Readable Dictionaries (e.g. WordNet) - Raw Corpora (not manually annotated) Supervised Disambiguation - Manually Annotated Corpora - Input of the learning system is: 1. a training set of the feature-encoded inputs 2. their appropriate sense label Unsupervised Disambiguation - Unlabelled corpora - Input of the learning system is: 1. a training set of feature-encoded inputs 2. NOT their appropriate sense label
Word Sense Disambiguation Knowledge-Based Disambiguation Examples: - Algorithms based on Machine Readable Dictionaries (e.g. Lesk alg) - Semantic Similarity Metrics - relies on semantic networks, like ontologies e.g. Sim(a,b)= -log(Path(a,b))/2*D) - may utilize on information content metric e.g. Sim(a,b)= IC(LCS(a,b)), IC(a)=-log(P(a)) - Heuristic-based Methods e.g. identify the most often used meaning and use it by default.
Word Sense Disambiguation • Knowledge-Based Disambiguation • Examples: • disambiguate “plant” in “plant with flower” • #1. plant, works, industrial plant • #2. plant, flora, plant life • Sim(plant#1, flower)=1.0 • Sim(plant#2, flower)=1.5 winner sense #2