1 / 11

INAOE at GeoCLEF 2008: A Ranking Approach based on Sample Documents

INAOE at GeoCLEF 2008: A Ranking Approach based on Sample Documents. Esaú Villatoro-Tello Manuel Montes-y-Gómez Luis Villaseñor-Pineda Language Technologies Laboratory National Institute of Astrophysics, Optics and Electronics Tonantzintla, México mmontesg@inaoep.mx

clare
Télécharger la présentation

INAOE at GeoCLEF 2008: A Ranking Approach based on Sample Documents

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. INAOE at GeoCLEF 2008: A Ranking Approach based on Sample Documents Esaú Villatoro-TelloManuel Montes-y-GómezLuis Villaseñor-Pineda Language Technologies Laboratory National Institute of Astrophysics, Optics and Electronics Tonantzintla, México mmontesg@inaoep.mx http://ccc.inaoep.mx/~mmontesg

  2. General ideas Our system focuses on the ranking process It is based on the following hypotheses: Current IR machines are able to retrieve relevant documents for geographic queries Complete documents provide more and better elements for the ranking than isolated query-terms We aimed to show that: Using some query-related sample texts it is possible to improve the final ranking of some retrieved documents

  3. Re-ranked documents Re-ranking Process General architecture of our system Document Collection First Stage (Retrieval Stage) Feedback Process IR Machine Query Retrieved documents (small) Selected Sample Texts Query Expansion Retrieveddocuments (large) Second Stage (Ranking stage)

  4. Re-ranking process Geonames DB Geo-Expansion Process SampleTexts Different ranking proposals Similarity Calculation Re-Ranked list of Documents Information Fusion |S| |R| |R| r s r 2 2 2 1 1 1 RetrievedDocuments

  5. System configurationTraditional modules IR Machine: Based on LEMUR Retrieves 1000 documents (original/expanded queries) Feedback module Based on blind relevance feedback Selects the top 5 retrieved documents (sample texts) Query Expansion Adds to the original query the five most frequent terms from the sample texts

  6. System ConfigurationRe-ranking module Geo-Expansion: Geo-terms are identified using NER LingPipe Expands geo-terms of sample texts by adding their two nearest ancestors (Paris  France, Europe) Similarity Calculation: Considers thematic and geographic similarities; it is based on the cosine formula Information Fusion: Merges into one single list all different ranking proposals, using the Round-Robin technique

  7. Evaluation points Document Collection First Stage (Retrieval Stage) 1st EP Feedback Process IR Machine Query Retrieved documents (small) Selected Sample Texts Re-ranked documents Query Expansion Re-ranking Process 2nd EP Retrieveddocuments (large) Second Stage (Ranking stage) 3rd EP

  8. Experimental resultsSubmitted runs +4.87% +3.33% +0% +3.24%

  9. +26.4% +15.8% +28.3% +3.24% Experimental resultsAdditional runs • Sample texts were manually selected (from Inaoe-BASELINE1) • Two documents were selected in average for each topic

  10. Final remarks Results showed that the query-related sample texts allow improving the original ranking of the retrieved documents Our experiments also showed that the proposed method is very sensitive to the presence of incorrect sample texts Since our geo-expansion process is still very simple, we believe it is damaging the performance of the method Ongoing Work A new sample text selection method A new strategy for geographic expansion that considers a more precise disambiguation strategy

  11. Thank you! Manuel Montes y Gómez Language Technologies Laboratory National Institute of Astrophysics, Optics and Electronics Tonantzintla, México mmontesg@inaoep.mx http://ccc.inaoep.mx/~mmontesg

More Related