1 / 13

Cross Language Information Retrieval (CLIR)

Cross Language Information Retrieval Based on Query Keyword Translation: An Internet Search Application. Atsushi Fujii and Tetsuya Ishikawa, International Journal of Computer Processing of Oriental Languages, 2000, 13:1, 1-13. Cross Language Information Retrieval (CLIR).

taipa
Télécharger la présentation

Cross Language Information Retrieval (CLIR)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cross Language Information Retrieval Based on Query Keyword Translation: An Internet Search Application Atsushi Fujii and Tetsuya Ishikawa, International Journal of Computer Processing of Oriental Languages, 2000, 13:1, 1-13.

  2. Cross Language Information Retrieval (CLIR) • The user presents queries in one language to retrieve documents in another language.

  3. Previous Research • In the 1970 Salton showed that hand-crafted bilingual thesaurus was comparable with monolingual information retrieval in performance. • (Used documents stored on a machine that were already identified and classified.)

  4. Information and Internet • The internet is a big storage facility for documents/web pages. • We can search these pages to retrieve information.

  5. CLIR • Through the 1990’s CLIR systems tried to access multilingual web pages. • Systems performance/precision by 2000 only 50 to 75 percent of monolingual systems.

  6. What is involved in CLIR over Internet • A CLIR needs a translation process along with a multilingual retrieval process. • Usually bilingual dictionaries, corpora, thesauri, and Machine Translation (MT) systems are used to translate queries and or documents.

  7. Retrieval Methods • Query translation approach. • Document translation approach • Interlingual representation approach.

  8. Query translation approach • Queries are translated into the document language prior to the retrieval process. • Three methods of translation are: • Dictionary-based • Corpus-based • Hybrid – corpora are used to resolve translation ambiguity in dictionaries.

  9. Document translation approach • Translates the documents into the query language, prior to retrieval. • Methods • Uses MT systems to conduct full translation • Translates only terms indexed by way of dictionary-based translation method • Oard and Hackett (1997) showed empirically that full document translation method outperformed the Query Translation Method but is expensive.

  10. Interlingual representation approach • Projects both queries and documents into language-independent representations • Thesaurus classes • Vector Space Models However these methods require manual alignment of bilingual thesauri/corpora. Carbonell et al. (1997) showed corpus-based query translation outperformed language-independent vector space models.

  11. System

  12. Probabilistic Translation • The authors use statistical models to perform translation to select the best word.

  13. Conclusion • The method proposed improves on baseline CLIR systems, through the use of compound translation system.

More Related