100 likes | 228 Vues
Relevance Feedback. Limitations Must yield result within at most 3-4 iterations Users will likely terminate the process sooner User may get irritated at seeing same documents repeated after every iteration It has proven to increase the effectiveness of retrieval.
 
                
                E N D
Relevance Feedback • Limitations • Must yield result within at most 3-4 iterations • Users will likely terminate the process sooner • User may get irritated at seeing same documents repeated after every iteration • It has proven to increase the effectiveness of retrieval
Designing a Relevance Feedback System • Use positive or negative relevance judgments • Where to apply relevance judgments (query, profile, document, retrieval algorithm) • Term weight modification. E.g., • Increase the weight for terms that appear in relevant docs • Add new terms found in relevant docs that are frequently mention in connection with query term
Genetic Algorithms • Several possible solutions are generated in parallel • The best few of these solutions is chosen and replicated, while the poor ones eliminated • Replicated solutions creates a breeding population, from which new solutions arise • The breeding is accomplished by by an exchange of some of the characteristics of the chosen solutions in a crossover operation
Genetic Algorithms (cont.) • Hill climbing is avoided by • Pursue multiple solutions in parallel, and discard the low hills • Introduce new characteristic values at low rate through mutation process (random exchange) • Relevance Feedback • Relieves the user of the burden of assigning term weights • Begins with no weights. Generates query variants by assigning term weights randomly
Genetic Algorithms (cont.) • Query variants are vector of query term weights • Each query variant is used to search the documents in the database • Evaluate each variant with equation on pg. 226 • The variants with highest value creating the most replications • The resulting breeding population is developed to the same size as the original population
Natural Language Processing • Focus on structure more than meaning, consequently problems are • Syntactic ambiguity e.g., they are visiting relatives • Deep structure of a sentence e.g., grace • May or may not be semantically correct e.g., Colorless green ideas sleep furiously • Syntactic rules do not apply to e.g., boolean queries
Natural Language Processing (cont.) • Semantic Analysis • Even more elusie e.g., red herring, carrying coals to Newcastle • Techniques for Semantic Analysis • Latent semantic indexing uses multidimensional scaling methods to identify concepts • Dialogue Analysis involves interaction that each time clarifies further what is to be retrieved
Citation Processing • Use of cited documents to enhance the description of a primary document • Some use co-citation as a measure of document similarity I.e., number of papers that cite both • Bibliographic coupling, when two documents cite the same document • Design problems: Locating citations, interpretation, eliminate duplicate/useless,
Hypertext Links • Means of connecting 2 distinct pieces of text • Consists of an identifier and a pointer • Possibly aid retrieval by suggesting hyperlinks given in top ranked document retrieved • Do not follow links from linked documents • Information Filtering: Eliminate large segments of database from consideration • Passage Retrieval: Identifying relevant sections within a large document encyclopedia
Image and Sound Processing • Techniques for evaluating and manipulating images directly • Voice recognition • Animation and sound: compare to those in libraries • Music can use style and then pattern matching