1 / 10

Enhancing Multilingual Text Retrieval with Terminological Information and Phrasal Browsing

This paper discusses a novel approach to improve interactive multilingual text retrieval by leveraging automatic terminology extraction and phrasal browsing techniques. Aimed at bridging the vocabulary gap between users and collection terminology—especially across languages—this system integrates various NLP resources such as tokenisers, morphological analysers, and semantic networks. By employing a phrase-based indexing method and query expansion, the results show that users benefit from enhanced document relevance. The study highlights the effectiveness of using terminological phrases as an intermediate searching method, offering a robust alternative to traditional thesaurus-guided searches.

swann
Télécharger la présentation

Enhancing Multilingual Text Retrieval with Terminological Information and Phrasal Browsing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Joint Conference on Digital Libraries 2001 Roanoke, VA Browsing by phrases: terminological information in interactive multilingual text retrieval Anselmo Peñas, Julio Gonzalo and Felisa Verdejo NLP Group, Dpto. Lenguajes y Sistemas Informáticos, Distance Learning University of Spain (UNED)

  2. Goals • to bridge the gap between users’ vocabulary and collection terminology • even cross-language • without needs of thesauri construction • robust and efficient integration of NLP resources and tools • Semantic network: EuroWordNet • Tokeniser • Morphological analyser • POS tagger • Shallow parser

  3. Approach Perform Automatic Terminology Extraction to provide: • At indexing time: Criteria to add to the index a controlled set of phrases • At query time: Term browsing, to navigate through the terminology and access the documents from complex terms

  4. Lemma Document Phrase Approach The task:To retrieve terminology • Lexical compounds are retrieved from mono-lexical terms Requires • A phrase indexing level • Query expansion • Query translation Phrasal information is used to reduce noise when expanding and translating (co-occurrence of words in the same phrase)

  5. Patterns for Spanish and Catalan N N N A N [A] Prep N [A] N [A] Prep Art N [A] N [A] Prep V N [A] Prep V N [A] Patterns for English A N [N] N N [N] A A N N A N N Prep N Terminology Extraction and Indexing Processing • Tokenising, Lemmatising,Tagging • Shallow parsing (Syntactic pattern recognition) Results Terminological phrases for each language • Term frequency • Document frequency • Component lemmas

  6. Query Expansion and Translation de Prohibición embargo entredicho interdicción interdicto proscripción ban interdiction prohibition proscription de Pruebas cata, catadura degustación ensayo escandallo experimento gustación muestreo, tanteo demonstrate establish, exhibit experiment experimentation fall, fitting indicate, point present, proof prove, run sample, sampling shew,show, taste test, trial, try Nucleares nuclear nuclear Tratados acuerdo capitulación concertación convenio cuidar, pacto manejar procesar accord discourse handle manage pact process treat treatise treaty Expansion Translation Nuclear fitting interdiction manage? Nuclear taste proscription process?

  7. Query in Spanish Hierarchy of terms Ranking of documents English Spanish Catalan

  8. QUERY EXPLORE DOCUMENT EXPLORE PHRASE RECONSULT WITH PHRASE

  9. Evaluation • 1523 sessions with interaction • an average of 5.11 actions per session • explore phrase is used in 65.13% All queries 1 word queries >1 word queries First action DOC 40.70% 45.49% 37.30% afterQUERY PHRASE 51.14% 45.65% 55.05% RECONSULT8.141%8.846%7.640% Last action before finishingQUERY 48.74% 53.38% 45.15% the session with PHRASE 42.95% 40.85% 44.57% exploreDOCRECONSULT 8.306% 5.764% 10.27%

  10. Conclusions • Development of a search engine based on terminology extraction • Using terminological phrases in an intermediate way between free-searching and thesaurus-guided searching • Without needs of thesaurus construction • Bridging the distance between the terms used in the query and the terminology used in the collection (even in different languages) • Users appreciate phrasal information for document selection • Phrases give higher expectations of relevance than Google’s ranking • WTB phrasal information can substantially complement the document ranking provided by the search engines

More Related