1 / 20

Detection of Links between Words in the Task of Syntactic-Semantic Analysis of Russian Texts.

Detection of Links between Words in the Task of Syntactic-Semantic Analysis of Russian Texts. Dmitry V. Merkuryev Saint-Petersburg State University, Russia Mathematics and Mechanics Faculty Department of Computer Science Petrozavodsk, May 21st, 2008. Content.

lizina
Télécharger la présentation

Detection of Links between Words in the Task of Syntactic-Semantic Analysis of Russian Texts.

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Detection of Links between Words in the Task of Syntactic-Semantic Analysis of Russian Texts. Dmitry V. Merkuryev Saint-Petersburg State University, Russia Mathematics and Mechanics Faculty Department of Computer Science Petrozavodsk, May 21st, 2008

  2. Content • 1. Introduction. The task of Syntactic-Semantic Analysis of Russian Texts. • 2. Syntactic and semantic analyzers. • 3. Main principles of V.A Tuzov’s theory. • 4. Sentence analysis. • 5. The detection of links between words. • 6. Examples. • 7.Conclusions.

  3. 1. Introduction. The task of Syntactic-Semantic Analysis of Russian Texts. Natural Language Processing (NLP) is one of the most actual tasks of modern computer science. Professor V.A.Tuzov's functional model [1], [2] is an adequate solution for natural language formalization. Syntactic-semantic analyzer is the unique working system based on this theory. It allows getting syntactic structure of Russian sentences which matches with their semantic one. The analyzer is able to solve word sense disambiguation problem for the most sentences of journal and even literature Russian texts. The detection of links between words is one of the most significant operations of the syntactic-semantic analyzer. This operation allows getting right semantic alternative of a word in sentence context.

  4. 2. Syntactic and semantic analyzers. Some of the most actual NLP parsers: DictaScope (Russian language syntactic parser) [3] The program automatically builds a word subordination tree.It also gets grammar values of words in a sentence. AOT (automatic handling of texts, Russian language) [4] This program builds semantic graph and performs initial semantic analysis of a text. Link Grammar Parser (syntactic parser of English) [5] The system assigns to a sentence a syntactic structure, which consists of a set of labeled links connecting pairs of words. All of these parsers have restrictions because of word sense disambiguation problem. Therefore, Professor Tuzov’s Syntactic-semantic analyzer is the unique system.

  5. 3. Main principles of V.ATuzov’s theory. Thesis 1. Language is algebraic system {f1, f2, ... , fn, M}, where fi is a basic function and M is data structure (basic concepts) of a given language. Thesis 2. Every word of language is the name of the function. This function allows us to evaluate the semantics of given word. Each sentence is a superposition of these functions. Thesis 3. Grammar is linked with semantics of language and represented by semantic dictionary.

  6. A function that corresponds to a word has semantic arguments and semantic-grammar types. Semantic arguments and grammar types consist of semantic classes and prepositional-case forms. Examples: $16~!Вин($16~! “Accusative”) $15~!Где($15~!”Where”) $<number> - notation of semantic class !Вин, !наВин(“on Accusative”), !Дат(“Dative”), etc – notations of prepositional-case forms !Куда(“Where to”),!Где,!Кому(“Whom”) , etc – notations of generalized grammar types Semantic-grammar types define links where this word connects to other words as an argument. Semantic arguments determine links where this word connects other words as arguments (by their semantic-grammar types).

  7. Example (results from the analyzer): Он едет в город (“He is going into the city”). Syntax tree of the sentence: @Глагол едет<X002.002> (@Им Он<X001.002><+МестГлаг3/2/.Шаг=1+>, @Куда в<X003.202><+ГлагОбст1/2/.Шаг=3+> (@Вин город<X004.001><+ПредлСуществ5/2/.Шаг=2+>) ) Semantic values of each word and links between them: Он (“He”) ** <X001.002> ОН {Мест._Муж @ОНЪ$17@Им} $17() semantics: <X001.002> ОН () \\ <2> links: Z1: @ОНЪ$17 <= <X002.002>

  8. едет(“is going”) ** <X002.002>ЕХАТЬ{Глагол.$15402~@Глагол}N%~ПОЕЗДКА$15402(Z1:!ОНЪ$17 \!ОНА$17\!ОНО$17,Z2:ПРИЧИНА$1/37/05\ПРИКАЗ$1526031~!Почему ,Z3:НЕЧТО$1~!поДат,Z4:НЕЧТО$1~!Откуда,Z5: НЕЧТО$1~!Куда ,Z6: ТРАНСПОРТ$121324~!Тв\!наПред) semantics:<X002.002>ЕХАТЬ Oper01(Z1,ПОЕЗДКА$15402(ПОЧЕМУ:Z2,ПОДАТ:Z3,ОТКУДА:Z4,КУДА:Z5 ,ТВ:НАПРЕД:Z6)) \\ <2> links: Z1: @ОНЪ$17 => <X001.002> Z5: $1~@Куда => <X003.202> в (“into”) ** <X003.202> В {Предлог. $12314~@Куда} (Z0:y> @Куда ,Z1: ПОСЕЛЕНИЕ$123~!Вин) semantics: <X003.202> В Y1>Direkt(Y1:,ВНУТРИ$12/313/05(ВИН:Z1)) \\ <200> links: Z1: $123~@Вин => <X004.001> Z5: $1~@Куда <= <X002.002>

  9. город (“the city”) ** <X004.001> ГОРОД {Сущв._Муж_Неодуш $12314~@ОНЪ$17@Вин} $12314(Z1 :СТРАНА$1231~!Род) semantics: <X004.001> ГОРОД (РОД:Z1) \\ <1> links: Z1: $123~@Вин <= <X003.202> Classifier of basic concepts. Basic concept is a word which meaning can’t be expressed through more simple concepts. There are more than 20000 basic concepts (nouns and adjectives) in the semantic dictionary. Other more than 90000 words (derived words) are expressed using superposition of basic concepts and basic functions. Basic concepts are organized in hierarchical tree (classifier). Main rules: All words of a class inherit the same semantic properties from parent class. Also words of the class have its own specific characteristics. The name of the root class is НЕЧТО("SOMETHING”). There are more than 1500 classes.

  10. Examples: $1 Noun НЕЧТО(“SOMETHING”), СУЩЕСТВИТЕЛЬНОЕ(“NOUN”) ,… $110 Noun AO (Abstract Object) Idea ПОНЯТИЕ (“CONCEPT”),… $1100/01 Noun АО Idea => Abstract-Concrete АБСТРАКТНЫЙ(“ABSTRACT”), КОНКРЕТНЫЙ(“CONCRETE”),… $12 Noun PO (Physical Object) МАТЕРИЯ(“SUBSTANCE”), ПРОСТРАНСТВО(“SPACE”), ТЕЛО(“BODY”),… $122 Noun PO Nature ПРИРОДА(“NATURE”),… $122/1 Noun PO Nature Weather ПОГОДА(“WEATHER”),… $12211 Noun PO Nature Plants Trees ДЕРЕВО(“TREE”), ДУБ(“OAK”), СОСНА(“PINE”),… Basic functions. Basic functions describe relationship between its arguments. We can express the formal meanings of each derived word by superposition of basic concepts and basic functions.

  11. Examples: And(x,y) x and y Caus(x,y) x causes of y Cont(x) x is continuing Content(x,y) x contents y Control(x,y) x controls y Func(x) x occurs Hab(x,y) x has y Incep(x) x is starting Lab(x,y) x exposes y Loc(x,y) x situated in y Magn(x) x higher of norm Mult(x) multiset of x Ne(x) negation of x Oper(x,y) x performs y Rel(x,y) x has a relation to y etc… ЛЕСНОЙ A1>Rel(A1:НЕЧТО$1,ЛЕС$122412) (“forest”, adjective, “something has a relation to a forest”) КОНСТРУИРОВАТЬ Caus(Z1,IncepFunc(КОНСТРУКЦИЯ$1/422(ВИН:Z2))) (“construct”, verb, “Z1 causes the appearance of a construction”)

  12. Semantic dictionary. It consists of more than 100000 Russian words. The dictionary can be divided into 2 main parts: syntactic and semantic. Examples: ПОЛУЧИТЬ (“get”, verb) Syntactic: ПОЛУЧИТЬ N%~ПОЛУЧЕНИЕ$15310/0/04({Z1: НЕЧТО$1~!Им,Z2 : НЕЧТО$1~!Откуда\!Изо\!Ото\!сРод,Z3: !заВин,Z4: ПИЩА$101/0\НЕЧТО$1~!Вин}) Semantic: ПОЛУЧИТЬ N%~ПОЛУЧЕНИЕ $15310/0/04 (PerfCaus(Oper01(Uzor(Z1,ОТКУДА:Z2),Z3),Hab(Z1,РОД:Z4))) \\ <4> НАГРАДА (“reward”, noun) Syntactic: НАГРАДА $1241/131/03({Z1: !Дат\!Род,Z2: !Тв,Z3: !заВин,Z4: !наВин}) Semantic: НАГРАДА $1241/131/03(ДАТ:РОД:Z1,ТВ:Z2,ЗАВИН:Z3,НАВИН:Z4) \\ <1>

  13. 4. Sentence analysis. The processing of natural language texts includes morphologic, word-by-word and syntactic-semantic analysis. The syntactic-semantic analyzer solves 2 main problems: - the selection of right semantic alternative of a word - the binding of selected alternatives in integrated construction. This system is represented with a bunch of recursive functions. Each function handles specific part of speech: verb, noun, preposition, adjective etc. 5. The detection of links between words. The detection of links is the main operation of the analyzer .It binds words or assembled constructions. There are 2 main types of interoperabilities between 2 constructions: - semantic arguments of incorporating construction interact with semantic-grammar types of affiliable construction (control link, e.g., verb and noun). - semantic-grammar types of a construction interact with semantic-grammar types of another one. (agreement link, e.g., adjective and noun)

  14. Examples of links: - by case: pronoun and noun: егоуспех (“his success”) links: @Им(“nominative”), @Вин(“accusative”) - by semantic class, case, gender, number: adjective and noun: красивый лес(“beautiful forest”) links: $1~@Онъ$17@Им $1~@Онъ$17@Вин Other examples are contained in the item 6 of the presentation.

  15. Dictionary articles of two neighboring words after the first steps of text processing have following structures: <word1> < semantic alternative 1> < semantic alternative 2> < semantic alternative 3> ... < semantic alternative n1> <word2> < semantic alternative 1> < semantic alternative 2> < semantic alternative 3> ... < semantic alternative n2> < semantic alternative>::= < { morphologic information, semantic-grammar types} (syntactic-semantic information, semantic arguments) <<additional arguments>> > Detection of links procedure check matches for all arguments of all semantic alternatives in a word1 with all arguments of all semantic alternatives in a word2. This procedure can be sufficiently optimized if use complex data structures (the optimization is the subject of current investigations).

  16. 6. An example of analyzed sentence. Людилюбятотдыхатьнаприроде(“People like to rest in nature”). Syntax tree of the sentence: @Глаголлюбят<X002.003> (@Им Люди<X001.001><+СущГлаг3/2/.Шаг=2+>, @Инфин отдыхать<X003.001><+ГлагИнфин6/2/.Шаг=1+> (@Где на<X004.090><+ГлагОбст1/2/.Шаг=4+> (@Пред природе<X005.001><+ПредлСуществ5/2/.Шаг=3+>) ) ) Semantic values of each word and links between them: Люди (“People”) ** <X001.001> ЧЕЛОВЕК {Сущв._Муж_Одуш $1241~@ОНИ$17@Им} $1241(Z1: ВРЕМЯ$16\ЧЕЛОВЕК$1241\ПЛАНЕТА$12271~!Род) semantics: <X001.001> ЧЕЛОВЕК (РОД:Z1) \\ <1> links: Z2: $124~@ОНИ$17 <= <X002.003>

  17. любят (“like”) ** <X002.003> ЛЮБИТЬ {Глагол. $1241/40113/05~@Глагол} N%~ЛЮБОВЬ$1241/40113/05(Z1: !Инфин,Z2: ЖИВОЙ$124~!ОНИ$17,Z3: !заВин) semantics: <X002.003> ЛЮБИТЬ Caus(ИНФИН:Z1,Oper02(ИМ:Z2,ПРИЯТНОСТЬ$1241/40012/03(ЗАВИН:Z3))) \\ <3> links: Z1: @Инфин => <X003.001> Z2: $124~@ОНИ$17 => <X001.001> отдыхать(“to rest”) ** <X003.001> ОТДЫХАТЬ {Глагол. $15308~@Инфин} N%~ОТДЫХ$15308(Imperf Z1 : #,Z2: !Ото,Z3: НЕЧТО$1~!Где) semantics: <X003.001> ОТДЫХАТЬ Oper01(Z1,ОТДЫХ$15308(ОТО:Z2,ГДЕ:Z3)) \\ <1> links: Z3: $1~@Где => <X004.090> Z1: @Инфин <= <X002.003>

  18. на (“in”) ** <X004.090> НА {Предлог. $122~@Где} (Z0:y> @Где,Z1:ПРИРОДА$122\ГРАНИЦА$12/15/16\РАССТОЯНИЕ$12/32\ПЛОЩАДЬ$12316~!Пред) semantics: <X004.090> НА Y1>Loc(Y1:,ПРЕД:Z1) \\ <105> links: Z1: $122~@Пред => <X005.001> Z3: $1~@Где <= <X003.001> природе (“nature”) ** <X005.001> ПРИРОДА {Сущв._Жен_Неодуш $122~@ОНА$17@Пред} $122(Z1 :!Род) semantics: <X005.001> ПРИРОДА (РОД:Z1) \\ <1> links: Z1: $122~@Пред <= <X004.090>

  19. 7.Conclusions. The syntactic-semantic analyzer based on V.A.Tuzov’s theory is the unique system. The detection of links between words allows getting the right semantic alternative of a word in a sentence. The correctness of text processing is more than 95%.

  20. Bibliography, internet resources: [1] Tuzov V.A. Mathematical Model of Language. Saint-Petersburg State University Publishing House, 1984, p. 176 (in Russian). [2] Tuzov V.A. Computer Semantics of Russian Language. Saint-Petersburg State University Publishing House, 2004, p. 400 (in Russian). [3] http://www.dictum.ru/ [4] http://www.aot.ru/ [5] http://www.link.cs.cmu.edu/link/

More Related