340 likes | 458 Vues
This guide explores effective strategies for refining search queries and enhancing information retrieval. It covers techniques such as categorization and clustering of documents, the utilization of Google Suggest, SurfWax FocusWords, and reverse dictionaries to find relevant keywords. Users will learn how to make longer, more effective queries, understand the importance of taxonomies and folksonomies, and how to analyze web pages for keyword optimization. Gain insights into various tools to enhance search efficiency and discover related information effortlessly.
E N D
Refining – Finding Words/expanding Taly Sharon taly@sharon-it.com sharont@alum.mit.edu www.sharon-it.com
Contents • Expanding/Learning terms • Categorization/Clustering engines • Google Suggest • SurfWax FocusWords • When you don’t know where to start www.sharon-it.com
Make Longer Queries Average Search Terms per Query Overall Experienced • Yahoo • Harvest Digital www.sharon-it.com
Adding Words • Holocaust 23,200,000 • holocaust memorial 836,000 • holocaust memorial budapest 42,300 • holocaust memorial budapest danube 4,910 • holocaust memorial budapest danube promenade 692 www.sharon-it.com
Classification/Categorization • Classification: the process of deciding the appropriate category for a given document. • Examples: • deciding what newsgroup an article belongs to. • what folder an email message should be directed to. • what is the general topic of an essay. www.sharon-it.com
Clustering • The process of automatically grouping documents. www.sharon-it.com
Search Categorization/Clustering • The result documents are ordered according to categories. • The searcher can select the relevant category to display the related documents. • Examples: • Vivisimo/Clusty • Excite • Teoma • Exalead www.sharon-it.com
Clusty www.sharon-it.com
Excite www.sharon-it.com
Ask www.sharon-it.com
Exalead www.sharon-it.com
Google Suggest • As you type – you get query suggestions and number of results per query. www.sharon-it.com
Google Suggest (2) www.sharon-it.com
Google Suggest www.sharon-it.com
Yahoo Search Assist www.sharon-it.com
SurfWax FocusWords • SurfWax has an option “Focus” • This option invokes the FocusWords mechanism • You get suggestions to make your query: • Broader • Similar • Narrower • http://www.surfwax.com www.sharon-it.com
SurfWax FocusWords www.sharon-it.com
When you don’t know how to start • Reverse Dictionary • Glossaries and Dictionaries • Taxonomy/Folksonomy • Pearl Culturing • Analyzing pages • Finding similar pages • Google’s related: • Alexa www.alexa.com www.sharon-it.com
Reverse Dictionary • OneLook reverse dictionary: http://www.onelook.com/reverse-dictionary.shtml • Example: “bird of prey” => raptor • Example: economical measure of a nation’s wealth => Gross Domestic Product www.sharon-it.com
Glossaries and Dictionaries • Google search: • <topic> • glossary OR thesaurus OR dictionary OR taxonomy • Example 1: agriculture glossary • http://www.cnie.org/nle/AgGlossary/AgGlossary.htm • http://www.cahe.nmsu.edu/news/aggloss.html • http://agriculture.house.gov/info/glossary.html • Example 2: agriculture thesaurus • http://agclass.nal.usda.gov/agt/agt.shtml • http://www.fao.org/aims/ag_intro.htm (multilingual) • http://www.glossarist.com • http://www.glossarist.com/glossaries/business/primary-industry/agriculture.asp www.sharon-it.com
Taxonomy/Folksonomy • Taxonomies • Found via directory search (example DMOZ): http://search.dmoz.org/cgi-bin/search?search=taxonomy • www.taxonomywarehouse.com (paid) • Folksonomy • Use tags in • Technorati www.technorati.com • Delicious www.del.icio.us www.sharon-it.com
del.icio.us www.sharon-it.com
del.icio.us www.sharon-it.com
Pearl Culturing • What to do when you don’t have the category nor the right keywords? • Find one good relevant website • Look it up in directories • You will find: • the category/main keywords • authoritative websites • Useful search engine: Exalead www.sharon-it.com
Analyze Pages • Distilling: what is problematic in a bad page? • what is wrong? Is there an interfering keyword/term appearing. • Remove interfering terms (using “-”). • Identifying clues and patterns in a good page. • Read the document, what are the clues? • Look for new keywords, word combinations and other things differentiating between it to non-authoritative documents. • Use frequency counter: • http://www.wordcounter.com/ • http://www.georgetown.edu/faculty/ballc/webtools/web_freqs.html www.sharon-it.com
Frequency Counter www.sharon-it.com
Wordcounter www.sharon-it.com
References • http://hacks.oreilly.com/pub/ht/2 • www.batesinfo.com • http://www.searchtools.com/info/classifiers.html www.sharon-it.com
Exercises • How are bad user interfaces called (hint: try Google suggest) • Reverse dictionary • Find relevant keywords for chemistry • What is the terminology for when menstruation stops? • How was the separation between the west and the soviet union called? • What are the related terms to Competitive Intelligence? • Check suggestions from Google Suggest for a query starting with biofuel. • Using SurfWax, learn options to focus or broaden the query: biodiesel. • Identify the most relevant terms in the website: www.uspto.gov • Identify the most relevant terms in the Biofuel Wikipedia entry http://en.wikipedia.org/wiki/Biofuel. • Search in Onelook reverse dictionary and in other glossaries terms: fuel, natural energy, geothermal, and other terms. Look at the results. www.sharon-it.com