320 likes | 435 Vues
Explore the challenges of information management in today’s rapidly evolving digital landscape. As the quantity of information grows exponentially, extracting relevant data from various sources becomes increasingly complex. Our Content Capture Technology Suite (CCTS) provides advanced tools and methodologies to integrate, customize, and aggregate dynamic content from different formats into a cohesive XML-based structure. Leveraging features like fuzzy logic for reliable data extraction and an intuitive user interface, CCTS empowers organizations to optimize data for mining and analysis, overcoming traditional capture limitations.
E N D
EAS313 Content Capture Technology Suite: EAI for the Web Scott McReynolds, Sr Manager, scottmc@sybase.com / 925 236 4558 Prashanth Ponnachath, Software Engineer pponnach@sybase,coml / 925 236 6286Date 08/07/2003
Information Management Challenges • Quantity of information within and outside of enterprises has grown exponentially • Challenge to extract relevant information from a multitude of sources • Integrating extracted content that may be in different formats (EAI issues)
Information Management Challenges • Task Specific Customization or Personalization • Combine data from several different sources into a new data source • Data aggregation for mining and analysis • Bottled up data by artificial network or security barriers
Existing Capture Methodologies By Other Vendors Static data stored in databases • Not equivalent to storing dynamic data • Need to refreshed at regular intervals • Legal problems • More infrastructure investment
Existing Capture Methodologies By Other Vendors Screen Scraping • Snooping the contents of some display memory of a smart terminal through its auxillary port • Parsing the HTML with programs designed to mine out patterns of content • Ugly, ad-hoc very likely to break on even minor changes to the format of the data being snooped.
Content Capture Technology Suite (CCTS) What does it do ? • Set of API that capture dynamic content from a variety of sources into individual elements • Deploy and replay captured elements in any portal framework • Aggregate data from multiple sources into XML
Technology Driving CCTS – Feature Extraction Traditional Extraction Methodology • Outside in, based on HTML tags • Content feed breaks if page changes slightly
Technology Driving CCTS – Feature Extraction CCTS Extraction Methodology • Inside out, based on features of content desired
Technology Driving CCTS – Feature Extraction Feature Extraction (FE) ensures reliability of content aggregation • Parses out information on a page and breaks down into specific components • Fuzzy logic “digital signature” or symbolic reference rather than a static link ensures persistent extraction of desired content • Pattern recognition through “object specific” parsers enable an extendable set of aggregated object
Technology Driving CCTS – CCL Content Collection Language (CCL) • ‘Content bundle’ of everything needed to collect and playback desired content • Designed to be programmed through a user interface instead of by hand • Simple as a URL, but as powerful as a web scripting language
Technology Driving CCTS – Navigation • Tightly coupled with Content Collection Language • Written in Java • Servlet based and can be easily tied to a GUI
Technology Driving CCTS – CCL (continued) • New commands are easily added, not keyword based language • Can reside on the client or the server • Parsing and error management are shared by all commands. • Fast execution. • Used to eliminate session/calls to DB
CCTS Components Content Capture Engine • Takes in user input via a navigation GUI and generates the CCL or XML Playback Engine • Translates CCL statements into content Content Repository Interface • Deploy captured content into any portal repository
CCTS Components Content Capture Workbench • Eclipse based GUI that allows users to capture and deploy content using a GUI • Reference implementation of Capture and CRI API • Design pattern that can be used as a reference to integrate any custom GUI to the CCTS API
Suite of Powerful Content Aggregation Tools DataParts reduces the number of data tasks that require a programmer, and makes the remaining tasks easy to accomplish.
EAI Tools • Grid Charts • Messaging Portlets • Integrated Scripting Environment • DataParts
Demo : Sailing Event Web Application Scenario • You are a portal developer for a company managing sailing events • Assigned a task of creating a portal containing following information • Race Sites • Live weather information • Wind speed for last 12 hours as a graph • Tide information as a graph • Marine weather