110 likes | 223 Vues
CADIAL search engine at INEX. Jure Mijić 1 , Marie-Francine Moens 2 , Bojana Dalbelo Bašić 1 1 Faculty of Electrical Engineering and Computing jure.mijic@fer.hr, bojana.dalbelo@fer.hr 2 Department of Computer Science, Katholieke Universiteit Leuven sien.moens @ cs.kuleuven.be
 
                
                E N D
CADIAL search engine at INEX ITI2008Cavtat2008-06-25 Jure Mijić1, Marie-Francine Moens2, Bojana Dalbelo Bašić1 1Faculty of Electrical Engineering and Computing jure.mijic@fer.hr, bojana.dalbelo@fer.hr 2Department of Computer Science, Katholieke Universiteit Leuven sien.moens@cs.kuleuven.be INEX 2008Schloss Dagstuhl Conference Center, Wadern, Germany2008-12-16
Presentation overview INEX 2008Dagstuhl2008-12-16 • What is CADIAL project? • System overview • Ranking model • Ad hoc results • Conclusion • Future work
What is CADIAL project? INEX 2008Dagstuhl2008-12-16 • Bilateral project between the Government of Flanders and the Ministry of Science, Education and Sports of the Republic of Croatia • Aims of the CADIAL project: • Provide access to a collection of Croatian legislative documents • Enable the use of the Eurovoc thesaurus, an EU standard thesaurus for document indexing and retrieval
System overview INEX 2008Dagstuhl2008-12-16 • Built with expandability in mind • Supports multiple information retrieval models • Supports morphological normalization modules • An indexer tool is used for document indexing • Input documents are in XML format • Output is an index database (a base structure for every search engine model) • Index database is upgraded with additional data required by the model (various statistical information)
Ranking model INEX 2008Dagstuhl2008-12-16 • Language model • Element priors based on element location and depth • Smoothing on document and collection level • Additional features • Support for CAS queries • Support for +/- keyword operators • Simple overlapping element removal • Stemming
Ad hoc results INEX 2008Dagstuhl2008-12-16 • Our runs: • Three CO runs • One returning only documents • Two returning elements • Three CAS runs with various smoothing factors
Ad hoc results INEX 2008Dagstuhl2008-12-16
Conclusion INEX 2008Dagstuhl2008-12-16 • Retrieving whole documents performed better than element retrieval at higher levels of recall • CAS queries performed slightly better that CO queries • Higher smoothing at the document level contributed to better performance
Future work INEX 2008Dagstuhl2008-12-16 • Other smoothing techniques • Pseudo relevance feedback • Incorporating link evidence • Information extraction methods
The End INEX 2008Dagstuhl2008-12-16 Thank you
Language model INEX 2008Dagstuhl2008-12-16