50 likes | 184 Vues
The Workshop on Challenges in Information Retrieval and Language Modeling, held in Amherst, Massachusetts, on September 11-12, 2002, focused on the key issues in cross-language information retrieval (IR). It highlighted opportunities in web research, intranet applications, and the necessity of effective multilingual systems, especially in a post-9/11 context. Key discussions centered on user involvement in query translation, document selection, and the need for improved machine translation in IR. The workshop aimed to explore integrated models beyond traditional approaches to enhance translingual search effectiveness.
E N D
Workshop on Challenges in Information Retrieval and Language Modeling Amherst, Massachusetts, September 11-12, 2002 Cross Language IR Philip Resnik Salim Roukos
2000 2005 English English Chinese Source: Global Reach Global Internet User Population If cross-language IR is “solved”, where is it???
Opportunities • World Wide Web • Research literature • Intranet applications • Necessities in a post-9/11 world • High volume intelligence analysis • Replacing current Boolean engines (or worse!) • Dealing with the on-paper legacy
Challenge: Role of the User • Query formulation for multilingual doc sets • Key idea: user needed in the query translation loop • Extracting examples from aligned parallel text • Document selection • Key idea: full MT isn’t good enough • Presenting phrases and entities (not “crummy MT”) • Query reformulation • Key idea: user’s understanding of the collection • Largely unexplored: different objective fn for MT
Challenge: Relating MT and IR • It is typical to think of MT and IR as two different processes • Weighting developed with monolingual mindset • Steps toward factoring in translation ambiguity • Toward integrated models • Beyond bags of words (or bags of n-grams) • Translingual search process (> 2 languages) • Use of context introduced by the search process • Document-level analysis, use of document context • Collection-level analysis