1 / 21

Aspire Document Processing

Aspire Document Processing. Document Processing – “Aspire”. Very High Performance Structured Document Processing Architecture Dynamic configuration and deployment Based on Open Source Technologies Well Supported (wiki, javadoc) Administration interface built-in

matt
Télécharger la présentation

Aspire Document Processing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Aspire Document Processing

  2. Document Processing – “Aspire” • Very High Performance • Structured Document Processing Architecture • Dynamic configuration and deployment • Based on Open Source Technologies • Well Supported (wiki, javadoc) • Administration interface built-in • Vendor Neutral (CMS and search engine)

  3. Top-Level Overview Aspire Document Processing Pipelines Data Sources Index Feeders Indexing

  4. Components In Aspire (today) Aspire Component Manager Pipeline Manager Feeders SubJob Extractors Enhancers Metadata Manipulation Output RSS Push XML to REST Get CCD Metadata Unload CSV Date Chooser Hot Folder Split Multi-valued data Error Job Handler RDB Enhancer Unload ARC Files Single Page Host to Domain Debug Output RDB Fetch URL RDB Unloader Text Extraction JMS Groovy Scripting Common Resources Feed One Category Tagger JDBC Connection Content Control DB Content Boost

  5. Functions Handled by Aspire • Threading • Collection Deployment • Error handling and notification • Including individual sub-job notifications • Collection Configuration • Component Scripting • Job Processing • Admin I/F, performance, live system status

  6. Benefits • Much lower lifecycle cost • File processing no longer an ad-hoc collection of java objects and methods • Encourages re-use of components • New collections with no programming • Just re-configure existing components • Flexibility: deploy collections individually • Much better visibility into the file processing internals, performance, and queuing

  7. Typical Installation Structure Machine #1 Machine #2 Crawler Aspire (other feeders and doc processing) Search Engine

  8. Aspire Architecture and ComponentsDetails

  9. Top-Level Component Architecture

  10. Aspire and OSGi Components Aspire Component Aspire Component Factory Manufactured By OSGi Bundle ISA Java Jar File ISA

  11. The Contents of a Bundle/Component Factory

  12. Component and Factory Details

  13. Aspire Sample Configurations

  14. Web Site Crawler / Search

  15. Processing CSV Files

  16. RSS Feeds, Single Pages

  17. Aspire Deployment

  18. Deployment • Architected to the latest deployment standards • Distribution Archetypes • Component Repositories • Redeploy collections independently • In a live running system • Redeploy and update components • In a live running system • Ready for the cloud

  19. Deployment Structure Administrator Aspire load/reload configuration Resources Feeders & Pipelines Collection Config Collection Config Configuration Control Collection Config Collection Config Collection Config Collection Config re-useable components ComponentRepository

  20. Deployment Implications • Collections are configured independently • Collections use standard components • Can be dynamically and remotely deployed Collection Config load remote configurations Remote System Aspire (always running) remote admin control

More Related