1 / 29

Synonymous Paraphrasing Using WordNet and Internet

Synonymous Paraphrasing Using WordNet and Internet. Igor A. Bolshakov & Alexander Gelbukh Center for Computing Research National Polytechnic Institute Mexico City, Mexico { igor,gelbukh}@cic.ipn.mx. Contents. Synopsis Absolute and Relative Synonyms Collocations

elvin
Télécharger la présentation

Synonymous Paraphrasing Using WordNet and Internet

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Synonymous Paraphrasing Using WordNet and Internet Igor A. Bolshakov & Alexander Gelbukh Center for Computing ResearchNational Polytechnic InstituteMexico City, Mexico {igor,gelbukh}@cic.ipn.mx

  2. Contents • Synopsis • Absolute and Relative Synonyms • Collocations • Evaluations of Collocations via Internet • Types of Synonymous Paraphrasing • Algorithm of Interactive Paraphrasing • An Experiment on Text Paraphrasing • Another Application: Style Evaluation • Yet Another Application: Linguistic Steganography

  3. Synopsis – 1 We propose a method of synonymous paraphrasing of a text based on • WordNet synonymy data and • Internet statistics of stable word combinations (collocations). Given a text, we look for words or word sequences in it for which WordNet provides synonyms, and substitute them with such synonyms only if the latter form valid collocations with the surrounding words according to the statistics gathered from Google

  4. Synopsis – 2 We present two important applications of local synonymous paraphrasing: • Style checking and correction: automatic evaluation and computer-aided improvement of writing style  with regard to various aspects • Steganography: hiding of additional information in the given text by special selection of collocationally verified synonyms

  5. Contents • Synopsis • Absolute and Relative Synonyms • Collocations • Evaluations of Collocations via Internet • Types of Synonymous Paraphrasing • Algorithm of Interactive Paraphrasing • An Experiment on Text Paraphrasing • Another Application: Style Evaluation • Yet Another Application: Linguistic Steganography

  6. Absolute and Relative Synonymyin general • Text variations that conserve whole text’s meaning are called synonymous paraphrasings • There exist global and local types of synonymous paraphrasing • Local paraphrasing only replaces separate words (which have synonyms) conserving the word order and the number of words • Synonyms are words or multiwords that can replace each other in some class of contexts with insignificant change of the whole text’s meaning • A synonymy dictionary consists of groups of words considered synonyms to each other • WordNet contains a type of synonymous dictionary • There exist absolute and relative synonyms

  7. Absolute and Relative SynonymsExamples • Relative synonyms- {(to) schedule,plan, design, map out, project, lay on, scheme}- {rollercoaster, big dipper, Russian mountains} • Absolute synonyms- {sofa, settee}- {United States of America, United States, USA, US}- {former president, ex-president}

  8. Synonymous Dictionarywe need • Synonymy dictionary such as in WordNet or EuroWordNet • A specially compiled dictionary of absolute synonyms that contain all abovementioned types of English equivalents Our algorithms look up first the absolute synonymy subdictionary

  9. Contents • Synopsis • Absolute and Relative Synonyms • Collocations • Evaluations of Collocations via Internet • Types of Synonymous Paraphrasing • Algorithm of Interactive Paraphrasing • An Experiment on Text Paraphrasing • Another Application: Style Evaluation • Yet Another Application: Linguistic Steganography

  10. Collocations in general • Collocation is a syntactically connected and semantically compatible pair of content (i.e. non-functional) words • Syntactical connectedness is understood as in dependency grammars (I. Melčuk) • Examples of English collocations are: full-lengthdress, wellexpressed, to brieflyexpose, to pick up the knife, to listen to the radio, energyfield,to promise to marry, to flatlyreject • Collocation components are connected to each other directly or through auxiliary words

  11. Collocation Databases For English, collocation databases exist only in printed form. The best is: Oxford Collocations Dictionary for Students of English. Oxford University Press, 2003 In this paper we consider Google search engine as a collocation database

  12. Contents • Synopsis • Absolute and Relative Synonyms • Collocations • Evaluations of Collocations via Internet • Types of Synonymous Paraphrasing • Algorithm of Interactive Paraphrasing • An Experiment on Text Paraphrasing • Another Application: Style Evaluation • Yet Another Application: Linguistic Steganography

  13. Evaluations of Collocations via Googlein general • Google statistics on occurrences of words or word sequences is given in number of web pages containing these items in any amounts • There are only two ways to evaluate the occurrence numbers of a collocation  by giving its components: • in quotation marks (underestimation) • without them (overestimation) • It is necessary to propose an heuristical measure in between those mentioned • It is also necessary to introduce a threshold , to exclude marginal situations

  14. Evaluations of Collocations via Google Statistics on synonymous collocations with project

  15. Evaluations of Collocations via Google Collocations with synonyms of departments:departments 42% offices 15% services 43%

  16. Contents • Synopsis • Absolute and Relative Synonyms • Collocations • Evaluations of Collocations via Internet • Types of Synonymous Paraphrasing • Algorithm of Interactive Paraphrasing • An Experiment on Text Paraphrasing • Another Application: Style Evaluation • Yet Another Application: Linguistic Steganography

  17. Types of Synonymous Paraphrasing • Text compression-the shortest synonyms are taken • Text canonization- the most frequently used synonyms are taken • Text simplification- synonyms more intelligible for language-impaired persons are taken (special marks of colloquialism are needed) • Conformistic variations- synonyms with the Internet distribution are randomly taken • Individualistic variations- nearly marginal synonyms within the Internet distribution are taken

  18. Contents • Synopsis • Absolute and Relative Synonyms • Collocations • Evaluations of Collocations via Internet • Types of Synonymous Paraphrasing • Algorithm of Interactive Paraphrasing • An Experiment on Text Paraphrasing • Another Application: Style Evaluation • Yet Another Application: Linguistic Steganography

  19. Algorithm of Interactive Paraphrasing Ask mode {compression, canonization, simplification, conformistic, individualistic} Ask marginality threshold  (0,1) and sensitivity threshold  (0,1) For each content word or multiword w which is a member of a synset Let S = union of all relevant synsets for w For each word v in S If its appropriateness a(v) <  then set score(v) = 0 else If mode = compression then set score(v) = 1 / length (v) If mode = canonization then set score(v) = a (v) If mode = simplification then set score(v) as described in S. 5 If mode = conformistic then set score(v) = random from 0 to a(v) If mode = individualistic then set score(v) = 1 / a(v) If score (w) / maxSscore (v) <  then suggest to the user all variants v in S, score(v)  0, in the order of score(v)

  20. Contents • Synopsis • Absolute and Relative Synonyms • Collocations • Evaluations of Collocations via Internet • Types of Synonymous Paraphrasing • Algorithm of Interactive Paraphrasing • An Experiment on Text Paraphrasing • Another Application: Style Evaluation • Yet Another Application: Linguistic Steganography

  21. An Experiment on Text ParaphrasingThe source text with possible replacements The Georgian foreign minister(foreign office head) is scheduled (planned, designed, mapped out, projected, laid on, schemed) to meet (have a meeting, rendezvous) with the heads(chiefs, top executives) of various(different, diverse) Russian departments(offices, services) and with a deputy of Russian foreign minister(foreign office head). “Issues(problems, questions, items)concerning(pertaining, touching, regarding) the future(coming, prospective) contacts at the higher(high-rank) level will be discussed(considered, debated, parleyed, ventilated, reasoned, negotiated, talked about) in the course of the meeting(receptions, buzz sessions, interviews),” said Georgian ambassador to Russia Zurab Abashidze. The Georgian foreign minister(foreign office head) will be in(visit) Moscow on a private(privy)visit(trip), the Russian Foreign Ministry reported(communicated, informed, conveyed, announced).

  22. An Experiment on Text ParaphrasingThe text with conformistic variations The Georgian foreign office headis plannedto have a meeting with the headsof diverse Russian offices and with a deputy of Russian foreign office head. “Questionstouching the future contacts at the high-rank level will be debated in the course of the interviews,” said Georgian ambassador to Russia Zurab Abashidze. The Georgian foreign minister will visit Moscow on a private trip, the Russian Foreign Ministry informed.

  23. Contents • Synopsis • Absolute and Relative Synonyms • Collocations • Evaluations of Collocations via Internet • Types of Synonymous Paraphrasing • Algorithm of Interactive Paraphrasing • An Experiment on Text Paraphrasing • Another Application: Style Evaluation • Yet Another Application: Linguistic Steganography

  24. Style Evaluation:for Compressibility Set Compressibility to 0 For each content word w in the text Set S = union of all relevant synsets containing w Remove from S the members v below the marginality threshold Let v0 be the shortest word in S Increase Compressibility in length(w) – length(v0)

  25. Contents • Synopsis • Absolute and Relative Synonyms • Collocations • Evaluations of Collocations via Internet • Types of Synonymous Paraphrasing • Algorithm of Interactive Paraphrasing • An Experiment on Text Paraphrasing • Another Application: Style Evaluation • Yet Another Application: Linguistic Steganography

  26. Linguistic SteganographyTwo Inputs: • The information I to be hidden, merely as a bit sequence • Any natural language text of the minimal length of approximately 250 per bit of I. The text is orthographically correct and semantically “common” (not a sequence of proper names, numbers, rhymes, etc.)

  27. Linguistic SteganographyAlgorithm: Search of synonyms- single or multiwords that have their own synsets Formation of synonymy groups- Search for unions of all relevant synsets Collocational verification of synonyms- Each member of the current group containing relative synonyms is tested as potential collocations together with its context wordsby Google statistics, with casting all inappropriate options Enciphering- The current group is cut in length to the nearest power p of 2 - The p-syllable, s, of the I is taken- The s-th synonym replaces the source synonym Reagreement

  28. Linguistic SteganographyMore detail in the paper: Bolshakov, I.A. A Method of Linguistic Steganography Based on Collocation-Proven Synonymy. In: Proceedings of International Information Hiding Workshop IH2004, Toronto, Canada, May 2004. Lecture Notes in Computer Science, Springer, 2004 (now available only in the preprint form)

  29. Thank you! Igor A. Bolshakov igor@cic.ipn.mx

More Related