1 / 10

Project Description 2 Inverted List Database

Project Description 2 Inverted List Database. Create an Inverted File. Tokenize a text document, and attach to each token a list of locations that this token has appeared Sort and Store these result in Oracle database. Tokenizer. Tokenizer

liza
Télécharger la présentation

Project Description 2 Inverted List Database

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Project Description 2Inverted List Database

  2. Create an Inverted File • Tokenize a text document, and attach to each token a list of locations that this token has appeared • Sort and Store these result in Oracle database

  3. Tokenizer • Tokenizer • Admissible symbols for token; we will not user delimiter to capture the token. • Keep a record of the position of each token

  4. Tokenizer Example: Document1: He is a dumb teacher Dumb! Dumb! and Dumb! Document2:He is a great council. His advices are really great. He truly helps.

  5. Tokenizer Inverted File for document 1: -continue: dumb 4 Dumb 6 Dumb 8 Dumb 11 He 1 is 2 teacher 5

  6. Tokenizer - Example: Inverted File for document 1: ! 12 ! 7 ! 9 a 3 and 10

  7. Tokenizer Inverted File for document 1 ! 7, 9, 12 (frequency= 3/ 12) a 3 and 10 Dumb 4, 6, 8 , 11 He 1 is 2 teacher 5

  8. Tokenizer Inverted File for document 2: (period) . 6 , 12 a 3 advices 8 are 9 council 5 great 4 , 11 He 1, 13 His 7, is 2 really 10

  9. Create a Token Database Organize a Inverted file for the following documents For Simple data Fro complex data

  10. Token database • Store the token into database • First Column is sorted tokens • Second Column is the Document Names • Rest of a tuple keeps locations of the token • This is the so called inverted list

More Related