1 / 24

Solving Italian Crossword Using the Web

Solving Italian Crossword Using the Web. Giovanni Angelini, Marco Ernandes and Marco Gori DII, University of Siena http://airgroup.dii.unisi.it http://webcrow.dii.unisi.it. Solving Crosswords. Crossword puzzles are probably the most popular language game played.

valentino
Télécharger la présentation

Solving Italian Crossword Using the Web

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Solving Italian Crossword Using the Web Giovanni Angelini, Marco Ernandes and Marco Gori DII, University of Siena http://airgroup.dii.unisi.it http://webcrow.dii.unisi.it

  2. Solving Crosswords Crossword puzzles are probably the most popular language game played. It requires a combination of skills: • a good comprehension of the clues expressed some times in an ambiguous or tricky way. • a large knowledge base. • a clever heuristic in order to fill correctly the puzzle. Problems like solving crosswords from clues are reputed as AI-complete.

  3. The idea Attack crosswords (within competition time limits)making use of the Web, being this the most extremelyrich and self-updating repositoryof human knowledge. Try to enfold with semantics real-life concepts using: • The Web. • Searchengines. • Information retrieval and machine learning techniques.

  4. Webcrow’s architecture

  5. Two main sub-problems • Clue-Answering: the aim is to associate each clue to the correct word answer. For each clue a ranked list of candidate solutions has to be generated. • Grid Filling:a Constrain-Satisfaction Problem. From each clue lista candidate has to be chosen and inserted in the crossword-puzzle, trying to satisfythe intrinsic constrains.

  6. Clue Answering Clue-answering differs from Question Answering in various forms: • There is no standard interrogative form. • There is an intrinsic and volunteer ambiguity. • The topic of the question can be both factoid and non-factoid. • There is a unique and precise correct answer: a single or a compound word. • Very high recall is required: missingthe target could lead to disaster in grid-filling.

  7. Generating the candidate lists Modules for generating the candidate lists: • The Web Search Module: find answers by exploiting the Web and search engines. • The Data-Based Module: returns possible candidatesmaking exact and partial matching on the clues of solved crosswords. • The Rule-BasedModule: deals with clues whose answers have no semantic relation, but that are crypticallyhidden inside the clues them-self. • The Dictionary Module: is used to increasethe global coverage of the clue-answering.

  8. The Web Search Module There are four task: • The retrieval of useful web documents. • Theextraction of the answer candidates from these documents. • The scoring/filtering of thecandidate lists. • The estimation of the list condence.

  9. Retriving useful documents • Each clue C = t1t2…tngenerates a maximum of 2 queries: Q1=< t1and t2and…tn> Q2=< t1or t2or…tn>. Non informative words are removed from the queries. • The first n ranked documents are downloaded (time consuming).

  10. Extracting and ranking the candidates • The documents are analysed by a parser which produces as output plain ASCII text. • This text is passed to a listgenerator that extracts the words(or compound-words) of the correct length. • Then passed to two submodules: a statistical filter, based on IRtechniques, and a morphological filter, based on machine learning and NLP techniques. Finally, the score-probability for each candidate w is:

  11. The Statistical filter

  12. The Statistical filter Score of the candidate w inside Di retrieved with query Qn: The distance between word wkand query Qn inside Di:

  13. The Morphological filter

  14. The Clue Classifier

  15. The filtering performances

  16. The modules performances

  17. Target position The frequency of the target in the first n positions in relation to its length with and without the WSM.

  18. Merging the lists and filling the grid • Merging: all the lists regarding a slot are merged into a unique list. • Grid filling: The goal is to assign a word toeach slot in order to maximize the similarity between the final conguration and the target solution. We adopted themaximum probability function: Due to the time restrictions and to thecomplexity of the problem we chose as a solving algorithm a CSP version of WA*.

  19. The data set The crossword collection is partitioned in five subsets: • T1ordcontaining examples of ordinarydifficulty from La Settimana Enigmistica. • T1difdesigned for skilled cruciverbalists from La Settimana Enigmistica. • T2newcrosswords that were publishedin 2004 from La Repubblica. • T2oldcrosswords that were publishedin 2001-2003from La Repubblica. • T3is a miscellaneous of examples from crosswordspecializedweb sites.

  20. Experimental results The performance overthe full test set is of 68,8%correctwords and of79,9% correct letters.Extendeding the time limit to 45 min., performancesincrease by a 7% in average.

  21. Webcrow vs Students

  22. Conclusions • Promising results:the version of WebCrow that is discussed here is basic but it has already given verypromising results. • Web-search approach: the web-search approach hasproved to be very consistent. • Many intersting problems: we believe it could suite all those problems in whichsemantics and interpretation play an important role. • Expert modules:WebCrow's overall architecture allows to plug in several expert modulesin order to increase the system's performances.

  23. Our Objectives • Webcrow vs humans: one of our main objectives is to build a system competitive with human experts in solvingcrosswords, and hopefully challenge masters in a real competition. • Crosswords in different languages: we aim for a system capable of solving crosswords in different languages by exploitinglanguage-independent and data-driven techniques, such as machine learning, avoiding(or limiting) pre-compiled rules, as usually done by question answering systems.

  24. Solving Italian Crossword Using the Web Giovanni Angelini, Marco Ernandes and Marco Gori DII, University of Siena http://airgroup.dii.unisi.it http://webcrow.dii.unisi.it

More Related