340 likes | 454 Vues
Discover a revolutionary content processing framework designed to enhance enterprise search applications. This presentation covers the key issues that plague traditional enterprise search methods and introduces innovative solutions by Search Technologies. We'll explore real-world use cases, the DPMS (Document Processing Methodology for Search), and how the new framework leverages advanced components and data quality monitoring for superior search outcomes. Learn about our independent technology approach and how we cater to diverse client needs in the search engine landscape.
E N D
A New Content Processing Framework for Search Applications Iain Fletcher ifletcher@searchtechnologies.com
Agenda • Briefly About Search Technologies • Key Issues for Enterprise Search • A New Content Processing Framework for Search Applications • How do we use it? • What does it look like? • Use case example
Search Technologies overview • The leading IT services company focused on search engines • Consulting • Implementation • Managed services • Technology independent, working with most of the leading search engines • 90 staff, 250+ customers
Search Technologies overview Ascot, UK Boston, MA Cincinnati, OH Herndon, VA San Diego, CA San Jose, CR
Executive team # years in the search engine industry
Agenda • Briefly About Search Technologies • Key Issues for Enterprise Search • A New Content Processing Framework for Search Applications • How do we use it? • What does it look like? • Use case example
Enterprise Search - An Indifferent Reputation • Major surveys show that no progress has been made during the last 10 years • Searchers are successful in finding what they seek 50% of the time or less • 2001, IDC, “Quantifying Enterprise Search” • More than half cannot find the information they need using their Enterprise search system • 2011, MindMetre/SmartLogic, “Mind the Enterprise Search Gap”
Metadata Supports Relevance Ranking Supported by great metadata! • Title • Meta description • URL • Inbound links • Alt tag text • Etc. • Provided for free by millions of SEO practitioners
Key Issues • Almost all modern search functions are driven by data structure
Key Issues • The majority of serious problems in serious search systems are caused by data quality issues Also... • “Big Data” and BI from unstructured data will face the same challenges • Can you trust an analysis if you are unsure of data providence?
Data quality examples • The subscription portal caught out by template information • The Intranet search skewed by a new piece of hardware • The Intranet search where great quality was the problem!
Key Issues • Data structure and quality issues are addressed in the indexing pipelines of search engines • Cleaning, enriching, normalizing, granularizing... • It is about process as much as technology • And data constantly evolves • Sometimes the built-in indexing pipeline is not good enough (issues with scale, flexibility or transparency) • Some search engines don’t really have one • We’ve written our own
Agenda • Briefly About Search Technologies • Key Issues for Enterprise Search • A New Content Processing Framework for Search Applications • How do we use it? • What does it look like? • Use case example
Document Processing Methodology for Search (DPMS) • The Philosophy • Understand the Document Model • Understand the User Model • Includes business-level requirements • Create the Search Engine Model • Search = the pivot point between User and Data • Document everything
DPMS – The Methodology Assessment (Search Technologies Architect and Business Analyst) Assessment Report 1 Expert assessment and recommendations Assessment DMDs DPMS Analysis (Knowledge Engineer, Business Analyst, etc.) Review (Architect, Domain Experts, Peers) 2 Detailed Analysis Implementation (Developer) Validate DMDs Aspire Validation 3 Search Engine Execution
Introducing “Aspire” • Think of it as a stand-alone indexing pipeline with a framework + component architecture • Framework built for scalability, performance and flexibility – designed to use cloud elasticity • Components built to be autonomous and transparent
Technology Suite • 100% Java • OSGi™ See www.osgi.org • The Dynamic Module System for Java™ • Apache Felix • Open source implementation of OSGi • Jetty • Embedded HTTP server • Maven & Maven Repositories • For component deployment
Component Configuration • Any number of document processing pipelines can be used in an application • Disparate data sources will need different treatment • Components can be shared where appropriate • Configurations are easy to change
Component autonomy • Components communicate via XML • Each component has a known and transparent input and output, and can be tested in isolation • This simplifies problem diagnosis, promotes transparency and controls cost-of-ownership
Data Quality Monitoring • Components have built-in quarantine systems to monitor data quality • Content is constantly evolving • This provides transparency and enables content issues to be diagnosed and resolved faster
The Component Library • Search Technologies maintains a library of components • Currently there are more than 70 • Components can be as simple as 3 lines of groovy script, or complex, 3rd party technologies • Many applications can be addressed using existing components + configuration
Component Upgrading • Components can be upgraded in-situ from a cloud-based service, without stopping/restarting the system • Helpful in the maintenance of complex or mission-critical systems
Component control • Every component has its own control / status page
Complexity example • CPA Global Discover • The world’s leading patent research portal • 80 million patents from 95 patent offices • More than a dozen navigators built • Numerous graphical search results display options • Whole document comparison features
In Summary • Many applications today don’t need this level of diligence • But as data and data dynamism grows, more will • A stand-alone unstructured content processing system can serve multiple applications, and makes sense for some companies • Method. Diligence. Transparency – its not rocket science... • Applying this approach to enterprise search is a key part of moving user satisfaction forward during the next few years
Thank You! Iain Fletcher ifletcher@searchtechnologies.com • http://uk.linkedin.com/in/iainfletcher