440 likes | 453 Vues
Discover the Deep Web, a vast collection of internet information sources not indexable by popular search engines. Learn about federated search and its benefits, and how Deep Web Technologies offers a powerful solution.
E N D
Swetswise Searcher Powered by Explorit Research Accelerator • By Abe Lederman • President and CTO • Copenhagen, Denmark • 11 June 2012
About Deep Web Technologies... • Founded by Abe Lederman in 2002 • A co-founder of Verity, acquired by Autonomy • BS & MS Degrees in Computer Science from MIT • 25 years experience in Information Retrieval • 20 person company based in Santa Fe, New Mexico • Over $5M in DOE SBIR Grants (2002-2011) • Pioneer/trailblazer in federated search
Government: Defense Technical Info Center (DTIC) Office of Sci. & Tech. Info (DOE-OSTI) UNECA European Space Agency Corporate: Boeing BASF Intel HP P&G Customers Include... • Academic: • Stanford University • George Mason University • Texas Medical Center • University College of Cork • Tennessee Community College Consortia • Public Portals: • WorldWideScience.org • Science.gov • Biznar • Mednar • ScienceResearch.com
What is the Deep Web? The Deep Web is a collection of internet information sources that are generally not accessible to web spiders or crawlers and can not, therefore, be indexed for search by popular search engines such as Google, Yahoo! or Bing (the Surface Web). It is estimated that there is more than 500 times more content in the Deep Web than the Surface Web.
What is “Federated Search”? “Federated Search is an application or service that allows users to submit a real-time search in parallel to multiple, distributed information sources and retrieve aggregated, ranked and de-duplicated results.”
One Search, Many Sources OPACs Blogs Subscription Sources eBooks Wikis Enter Your Search… Begin Search Internal Databases Public Web Sources Journals
Why Federated Search? 4 Big Reasons… • Provides greater efficiency than searching sources one by one • Returns the most current information because sources are searched in real-time • Eliminates learning disparate publisher interfaces • Simplifies discovery of the most relevant results
Best Science-Focused Engines Science.gov WorldWideScience.org ScienceResearch.com ScienceAccelerator Scitopia.org 5 of 9 created by DWT
Presentation available at: www.deepwebtech.com/ala2011.ppt
Federated Search Has Gotten a Bad Reputation • It is too slow • Connectors break • Brings back too few results from each source • Brings back too many results • Unable to rank results well (meta-data differences, lack of info)
Drawbacks of Discovery Services • Lack of transparency of what’s in Service • Incomplete coverage of publisher content • Lag between when content appears on publisher site and when available on Discovery Service • Normalized metadata loses content source-specific metadata • Content in Service limited by relationships, content of general interest
Landscape is Not So Clear • Summon (ProQuest) • Discovery Service • EDS (EBSCO) • Discovery Service + Federated Search • WorldCat Local (OCLC) • Discovery Service + Federated Search • Primo (Ex Libris) • Discovery Service + Federated Search • Encore Synergy (Innovative Interfaces) • Limited Discovery Service + Federated Search • Explorit (Deep Web Technologies) • Federated Search
When Should You Choose Federated Search? • Access to up-to-date information is important. • You want control of your sources. • You want to search internal/non-mainstream sources • Your research is specialized (ex. Medical and legal) • You have a wide range of subscribed content (ex. EBSCO and ProQuest)
Major Advantages of SwetsWise Searcher • Rich, easy-to-use interface • Incremental display of results • Sophisticated connector technology • Retrieve 50-100 results or more per source • Relevance ranking • Smart clustering • Alerts and Search Builder • Metrics
Easy-to-use Interface Simple Search Box • One-Search, “Google-like” box • Can be embedded in your home page, blog or intranet.
AND, OR, NOT Advanced Search Page • Unlimited categories (sources can be in multiple categories) • Select sources to search • One or Two columns • Fielded Searching • Boolean Searching
Connectors: Think “Connections” Connectors make it possible to talk to other data sources • Each source is unique so connectors “normalize” a query • Submit proper authentication to sources • Extract the right results • Parse results to display the data
Connector Monitoring • Proactively monitor connectors • Monitor: source health, speed, responsiveness and errors • Evaluated by dedicated software maintenance engineers • Generally errors are discovered by our team before users ever notice a problem
Relevance Ranking • Occurance of search terms within titles & snippets • Assigning weight to sources • More current reults are assigned greater weight Read: “Ranking: The Secret Sauce for Searching the Deep Web”
Clustering • Real-time semantic analysis of results creates clusters on-the-fly. • Discover relationships behind the results, not just “keywords.” Read: “Clusters That Think”
Alerts • Delivery online or via email • Daily, Weekly, Monthly • Pick and choose your sources • Export to RSS reader • Maintain database of past results
Search Builder • Create search pages easily • Choose collections and search fields • Integrates with Course Management Software • Embed search box using built-in widget
SwetsWise Searcher Metrics • Graphics-based or tabular • Single day (hourly breakdown) or entire month • Downloadable to spreadsheet • Reports include: • Number of queries run • Number of results retrieved per source • Average time to retrieve results from a source • Average rank of results retrieved per source • Timeouts/errors by source • Searches run (query strings) • Clickthrough stats
Hosted vs. Installed Solutions Hosted Installed
WorldWideScience.org is an Excellent Candidate for Multilingual Search • A global gateway to international science databases and portals • All content is from national governments or vetted by national governments • Developed in partnership with the DOE Office of Scientific and Technical Information (OSTI), WWS Alliance and Microsoft Research • One-stop searching • Includes databases from China, Japan, Korea, Germany, and other non-English countries
How Multilingual Federated Search Works Results in source’s language Foreign language search engines German Chinese Russian Query in source’s language Results returned to user Ranking Microsoft Translator Query to be translated for each source Ranked results translated by Microsoft to user’s language EXPLORIT Ranked results in user’s language Query in user’s language
Coming in the Fall • Visualization • Full-Faceted Navigation • Mendeley Integration • Document Type and Document Format Clusters • Full Text Filter
Visualization Using our clustering technology, results visualization allows users to see relationships between topics easily.
Full Text Filter Access Full Text!
Thank you! Abe Lederman abe@deepwebtech.com