230 likes | 540 Vues
Advanced Indexing Techniques with Apache Lucene - Payloads. Agenda. Part 1: Inverted Index 101Posting ListsStored Fields vs. PayloadsPart 2: Use cases for PayloadsBoostingTermQuerySimple facet counting. Advanced Indexing Techniques with Apache Lucene - Payloads. Lucene's data structures. Inver
 
                
                E N D
1. Advanced Indexing Techniques with Apache Lucene - Payloads Advanced Indexing Techniques with 
Michael Busch
(buschmi@apache.org) 
2. Advanced Indexing Techniques with Apache Lucene - Payloads Agenda Part 1: Inverted Index 101
Posting Lists
Stored Fields vs. Payloads
Part 2: Use cases for Payloads
BoostingTermQuery
Simple facet counting 
3. Advanced Indexing Techniques with Apache Lucene - Payloads 
4. Advanced Indexing Techniques with Apache Lucene - Payloads 
5. Advanced Indexing Techniques with Apache Lucene - Payloads 
6. Advanced Indexing Techniques with Apache Lucene - Payloads 
7. Advanced Indexing Techniques with Apache Lucene - Payloads 
8. Advanced Indexing Techniques with Apache Lucene - Payloads 
9. Advanced Indexing Techniques with Apache Lucene - Payloads So far… String comparison slow
Inverted index used to accelerate search
Store positions in posting lists to allow phrase searches
Store payloads in posting lists to store arbitrary data with each position 
10. Advanced Indexing Techniques with Apache Lucene - Payloads 
11. Advanced Indexing Techniques with Apache Lucene - Payloads 
12. Advanced Indexing Techniques with Apache Lucene - Payloads 
13. Advanced Indexing Techniques with Apache Lucene - Payloads 
14. Advanced Indexing Techniques with Apache Lucene - Payloads Agenda Part 1: Inverted Index 101
Posting Lists
Stored Fields vs. Payloads
Part 2: Use cases for Payloads
BoostingTermQuery
Simple facet counting 
15. Advanced Indexing Techniques with Apache Lucene - Payloads org.apache.lucene.analysis.Token 
16. Advanced Indexing Techniques with Apache Lucene - Payloads Analyzer: 
17. Advanced Indexing Techniques with Apache Lucene - Payloads Similarity: 
18. Advanced Indexing Techniques with Apache Lucene - Payloads 
19. Advanced Indexing Techniques with Apache Lucene - Payloads Analyzer: 
20. Advanced Indexing Techniques with Apache Lucene - Payloads Hitcollector: Use different PriorityQueues for different sites
Instead of returning top-n results of the whole data set, return top-n results per site
 
21. Advanced Indexing Techniques with Apache Lucene - Payloads Summary In this example: facet (site) used for scoring, but extendable for facet counting
Good performance due to locality of facet values
 
22. Advanced Indexing Techniques with Apache Lucene - Payloads Payloads offer great flexibility
Payloads are stored very space-efficient
Sophisticated data structures enable efficient skipping over payloads
Payloads should be used whenever special data is required for finding hits and scoring
 
23. Advanced Indexing Techniques with Apache Lucene - Payloads Finalize API (currently Beta)
Add more out-of-the-box query types
Per-document Payloads
 
24. Advanced Indexing Techniques with Apache Lucene - Payloads Advanced Indexing Techniques with 
Questions ?