1 / 38

Using the Lucene Search Engine

Using the Lucene Search Engine. Team. Concepts. Lucene. Full Text Search Cross Platform Lucene Document Inverted Index. Lucene. iViewXT. Search Improvements. Test Document Collections. UAT. Super Mario. Implementation. Derek. Performance. Lucene Implementation.

Télécharger la présentation

Using the Lucene Search Engine

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Using the Lucene Search Engine

  2. Team

  3. Concepts

  4. Lucene Full Text Search Cross Platform Lucene Document Inverted Index

  5. Lucene

  6. iViewXT

  7. Search Improvements

  8. Test Document Collections • UAT

  9. Super Mario

  10. Implementation Derek

  11. Performance

  12. Lucene Implementation

  13. Lucene Implementation: Indexing

  14. Lucene Implementation: Indexing

  15. Lucene Implementation: Indexing

  16. Lucene Indexing

  17. Lucene Indexing Step 1 of 5

  18. Lucene Indexing Step 2 of 5

  19. Lucene Indexing Step 3 of 5

  20. Lucene Indexing Step 4 of 5

  21. Lucene Indexing Step 5 of 5

  22. Lucene Indexing

  23. Text Extraction • Lucene not a complete application. • PDF files text extraction • Microsoft files text extraction

  24. Lucene Implementation

  25. Lucene Implementation

  26. Searching:

  27. Searching: Step 1 of 6

  28. Searching: Step 2 of 6

  29. Searching: Step 3 of 6

  30. Searching: Step 4&5 of 6

  31. Searching: Step 6 of 6

  32. Searching:

  33. Luke - Lucene Index Toolbox • Client application to link directly into your index. • Java-webstart app • http://www.getopt.org/luke/ • Handy for testing searches and performance.

  34. Some problems encountered • Max clause count exception: • Take care automatically adding wildcards!! • Performance: • Do the work while indexing, not while searching. • Pagination: Get one page at a time from the Hits. • Our security model • Stored collection of allowed containers in UserSession. • Visibility of indexing job. • Added logging “Indexing document 426 of 204,532”

  35. http://lucene.apache.org/ http://www.ibm.com/developerworks/web/library/wa-lucene2/ http://www.ibm.com/developerworks/library/wa-lucene/ An open source document management system in php with a java lucene search engine Resources (general)‏ Handy ajax autocomplete component.

  36. Resources (text extraction)‏ http://pdfbox.org Text extractor for pdf files JXL http://jexcelapi.sourceforge.net/ Text extractor for excel files. Text extractor for word documents. API to access Microsoft format files. (xls/doc/ppt). I would recommend this one over jxl or text-mining above.

  37. Summary Lucene querying is fast (take care what you do with the results) Indexing is slow (Make indexing job visible) Use Luke Add lots to the index (Do the work while indexing)

  38. END

More Related