1 / 7

Information Retrieval

Information Retrieval. Ch 23.2. Information retrieval. Goal: Finding documents Search engines on the world wide web IR system characters Document collection Query language Result set Presentation of the result set. Evaluating IR system. Precision

Télécharger la présentation

Information Retrieval

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Information Retrieval Ch 23.2

  2. Information retrieval • Goal: Finding documents • Search engines on the world wide web • IR system characters • Document collection • Query language • Result set • Presentation of the result set

  3. Evaluating IR system • Precision • (relevant doc in result set)/(doc in result set) • Recall • (relevant doc in result set)/(relevant doc)

  4. Presentation of result sets • Relevance feedback • User saying which doc are relevant • Document classification • Preexisting taxonomy of topics • Ch 18 • Document clustering • Tree of categories is created from scratch • Ch20.3 • Agglomerative clustering: merge nearest two doc. • K-means clustering: assign doc. Into k categories.

  5. K-means clustering • Pick k documents at random to represent the k categories • Assign every document to the closest category • Compute the mean of each cluster and use the k means to represent the new values of the k categories. • Repeat steps 2 and 3 until convergence.

  6. Implementing IR systems • Lexicon • Stop words • Inverted file • Vector space model

  7. Vector Space Model • Transform document into vector • Di=ABC, Dj=BBC • Di={1, 1, 1}, Dj={0,2,1} • Measure the distance between two document • Dist=Di ‧Dj = Sqrt((1-0)2+ (1-2) 2+ (1-1) 2) • Retrieval documents with smallest distance

More Related