Information Retrieval Homework #1
80 likes | 289 Vues
Information Retrieval Homework #1. Members: Wesley, Lbr, Shuang CSIE, NCU. Outline. Introduction Stemming Algorithm Suffix Tree Performance Conclusion. Stemming Algorithm (optional). Goal of stemming improve performance and require less resources by reducing the number of unique words
Information Retrieval Homework #1
E N D
Presentation Transcript
Information RetrievalHomework #1 Members: Wesley, Lbr, Shuang CSIE, NCU
Outline • Introduction • Stemming Algorithm • Suffix Tree • Performance • Conclusion
Stemming Algorithm(optional) • Goal of stemming • improve performance and require less resources by reducing the number of unique words • Ex. “computable”, “computation”, “computability” • Porter Algorithm (most commonly accepted)
Suffix Tree Library • libsfxdisk-1.2 is a Fast indexing library based on suffix tree • Storing, retrieving, deleting and dumping/loading the database
Indexing (Optional) Dir Name DirReader StopWords Stem File Name FileReader SuffixTree Filter Delete Index File
Searching Key Word SearchEngine Index Print OutResults
Performance • Total Indexing Time • Spend more time • One file take about one minute • Average searching time • very quick • http://140.115.156.49/~wesley/IR.html
Future • To add stemming scheme • To limit indexing time • Additional searching • AND, OR