50 likes | 177 Vues
Workshop on Challenges in Information Retrieval and Language Modeling Amherst, Massachusetts, September 11-12, 2002. Cross Language IR. Philip Resnik Salim Roukos. 2000. 2005. English. English. Chinese. Source: Global Reach. Global Internet User Population.
E N D
Workshop on Challenges in Information Retrieval and Language Modeling Amherst, Massachusetts, September 11-12, 2002 Cross Language IR Philip Resnik Salim Roukos
2000 2005 English English Chinese Source: Global Reach Global Internet User Population If cross-language IR is “solved”, where is it???
Opportunities • World Wide Web • Research literature • Intranet applications • Necessities in a post-9/11 world • High volume intelligence analysis • Replacing current Boolean engines (or worse!) • Dealing with the on-paper legacy
Challenge: Role of the User • Query formulation for multilingual doc sets • Key idea: user needed in the query translation loop • Extracting examples from aligned parallel text • Document selection • Key idea: full MT isn’t good enough • Presenting phrases and entities (not “crummy MT”) • Query reformulation • Key idea: user’s understanding of the collection • Largely unexplored: different objective fn for MT
Challenge: Relating MT and IR • It is typical to think of MT and IR as two different processes • Weighting developed with monolingual mindset • Steps toward factoring in translation ambiguity • Toward integrated models • Beyond bags of words (or bags of n-grams) • Translingual search process (> 2 languages) • Use of context introduced by the search process • Document-level analysis, use of document context • Collection-level analysis