1 / 32

Digitization Workflow Management System for Massive Digitization Projects

Digitization Workflow Management System for Massive Digitization Projects. The 2 nd International Conference on Universal Digital Library 2006 (ICUDL 2006) Mohamed Yakout Noha Adly Magdy Nagi mohamed.yakout@bibalex.org noha.adly@bibalex.org magdy.nagi@bibalex.org.

bao
Télécharger la présentation

Digitization Workflow Management System for Massive Digitization Projects

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Digitization Workflow Management System for Massive Digitization Projects The 2nd International Conference on Universal Digital Library 2006 (ICUDL 2006) Mohamed Yakout Noha Adly Magdy Nagi mohamed.yakout@bibalex.orgnoha.adly@bibalex.orgmagdy.nagi@bibalex.org Bibliotheca Alexandrina November 19, 2006

  2. Goals • Automate, track and manage the digitization workflow. • Flexibility in defining digitization workflow Phases. • Support dynamic evolution and deviations with a history tracking. • Flexibility integration with the LIS and Library Digital Repository. • Accept external partially digitized Jobs to start in the proper Phase within the digitization workflow • Simultaneous management of multiple projects with a diversity of materials (books, journals, manuscripts, audio, video, slides, … etc)

  3. Related Work • Manual workflow management using several software packages (MS Excel, MS SharePoint, MS Project) • Simple tracking workflow system with limited capabilities • Several integrated digitization activities (digital capturing, image processing, OCRing, …) in one software • DOCWorks from CCS. • BookRestorer from i2s. • OUPS • Limitations: • Tightly coupled with certain tools and do not allow easily other tools to be integrated. • No Resources Management (e.g. Workstations and users) • Lack of projects and collections management. • Manual files handling between the storage server and clients. • Lack of handling workflow exceptions, dynamic evolution and deviations except through manual intervention.

  4. System Data Model

  5. System Data Model The object being digitized • Book for Naguib Mahfouz • Photos for an event • Map for Alexandria • Music sheet for Omar Khayrat

  6. System Data Model All types of materials in the system • Book Manuscripts • Map Journals • Audio Video

  7. System Data Model A task that should be applied within the digitization process • Scanning Processing • OCRing Encoding • Publishing Zipping for archiving

  8. System Data Model The system users with several roles • Digital lab operators • Shift operators • Administrator

  9. System Data Model Represents logical grouping for the Jobs • Nasser • AlexMed • AMEEL

  10. System Data Model The computer used to perform the Phase

  11. System Architecture

  12. System Architecture

  13. System Architecture

  14. System Handlers <Phase Name="Book Arabic OCR"> <PrePhase> <Physical Mode="UnRestricted"> <Folder Name="OTIFF" Create="false" ToDestination="false" NewName="OTIFF" Mode="Restircted"> <File Name="OriginalFiles" Type="tif" Count="+" ToDestination="false" Compare=""/> </Folder> . . </Physical> </PrePhase> <PostPhase> <Physical Mode="UnRestricted"> <Folder Name="TXT" Create="false" ToDestination="true" NewName="TXT" Mode="Restircted"> <File Name="" Type="frf" Count="1" ToDestination="true" Compare=""/> <File Name="" Type="art" Count="1" ToDestination="true" Compare=""/> </Folder> </Physical> <Database> <Field Name="Font" DisplayName="Font Family: " /> <Field Name="LrnPage" DisplayName="Learn Page : "/> . . </Database> <ReflectionCall Method="packageName.doSomething" /> </PostPhase> </Phase> • XML Phases Definition Handler • Pre-Phase and Post-Phase • Physical section • Database section • Reflection Call

  15. System Handlers <Phase Name="Book Arabic OCR"> <PrePhase> <Physical Mode="UnRestricted"> <Folder Name="OTIFF" Create="false" ToDestination="false" NewName="OTIFF" Mode="Restircted"> <File Name="OriginalFiles" Type="tif" Count="+" ToDestination="false" Compare=""/> </Folder> . . </Physical> </PrePhase> <PostPhase> <Physical Mode="UnRestricted"> <Folder Name="TXT" Create="false" ToDestination="true" NewName="TXT" Mode="Restircted"> <File Name="" Type="frf" Count="1" ToDestination="true" Compare=""/> <File Name="" Type="art" Count="1" ToDestination="true" Compare=""/> </Folder> </Physical> <Database> <Field Name="Font" DisplayName="Font Family: " /> <Field Name="LrnPage" DisplayName="Learn Page : "/> . . </Database> <ReflectionCall Method="packageName.doSomething" /> </PostPhase> </Phase> • XML Phases Definition Handler • Pre-Phase and Post-Phase • Physical section • Database section • Reflection Call

  16. System Handlers <Phase Name="Book Arabic OCR"> <PrePhase> <Physical Mode="UnRestricted"> <Folder Name="OTIFF" Create="false" ToDestination="false" NewName="OTIFF" Mode="Restircted"> <File Name="OriginalFiles" Type="tif" Count="+" ToDestination="false" Compare=""/> </Folder> . . </Physical> </PrePhase> <PostPhase> <Physical Mode="UnRestricted"> <Folder Name="TXT" Create="false" ToDestination="true" NewName="TXT" Mode="Restircted"> <File Name="" Type="frf" Count="1" ToDestination="true" Compare=""/> <File Name="" Type="art" Count="1" ToDestination="true" Compare=""/> </Folder> </Physical> <Database> <Field Name="Font" DisplayName="Font Family: " /> <Field Name="LrnPage" DisplayName="Learn Page : "/> . . </Database> <ReflectionCall Method="packageName.doSomething" /> </PostPhase> </Phase> • XML Phases Definition Handler • Pre-Phase and Post-Phase • Physical section • Database section • Reflection Call

  17. System Handlers <Phase Name="Book Arabic OCR"> <PrePhase> <Physical Mode="UnRestricted"> <Folder Name="OTIFF" Create="false" ToDestination="false" NewName="OTIFF" Mode="Restircted"> <File Name="OriginalFiles" Type="tif" Count="+" ToDestination="false" Compare=""/> </Folder> . . </Physical> </PrePhase> <PostPhase> <Physical Mode="UnRestricted"> <Folder Name="TXT" Create="false" ToDestination="true" NewName="TXT" Mode="Restircted"> <File Name="" Type="frf" Count="1" ToDestination="true" Compare=""/> <File Name="" Type="art" Count="1" ToDestination="true" Compare=""/> </Folder> </Physical> <Database> <Field Name="Font" DisplayName="Font Family: " /> <Field Name="LrnPage" DisplayName="Learn Page : "/> . . </Database> <ReflectionCall Method="packageName.doSomething" /> </PostPhase> </Phase> • XML Phases Definition Handler • Pre-Phase and Post-Phase • Physical section • Database section • Reflection Call

  18. System Architecture

  19. System Architecture

  20. System Architecture

  21. System Architecture

  22. System Modules • Check-In • Plug-in based for integration. • Creates the Job in the system • Assign the Job to any Phase • Check-Out • Java Reflection Call section of the XML Phases Definition • Ingest the Job’s digital objects into the repository

  23. System Architecture

  24. System Modules • Phases Manager • Request a new Job • Download the Jobs folders and files • Submit the Job back to the system to continue other Phases • Reject a Job and recommend another Phase in addition to specifying reasons. • Redirect a Job from the default Phase Sequence • Provide information on the files level to help solving problems

  25. System Modules (Contd) • Reporting • Workflow Tracking • Pending Items • Late Jobs • Operators rates • Build Customized Report • Archiving • On different Medias with different size and on online storage • Administration

  26. BA Digitization Workflow

  27. Quality Assurance • Supported on two different stages • Maintain QA information on the files levels while moving from a Phase to another. • A QA Phase is defined in the Digitization Phase Sequence as the last Phase before the Archiving

  28. Achieving Flexibility Using DWMS • The defined Phase Sequence for a Job Type is a guide, rather than a prescription. • The list of Phases can or can not be in the Phase Sequence. The operator can assign the Job to any of all of these Phases. • Jobs can be Forwarded dynamically to another Phase in the Phase Sequence. • Changes in the Phase Sequence affects the current and new Jobs in the system, leading to natural process evolution

  29. Job Life Cycle

  30. Future Work • Check-out plug-in for Fedora.. • Check-in plug-ins will be implemented to support various metadata standards formats MODS, DC, VAR, etc. • Enhance the software interface with graphical tools to help design and follow the digitization process.

  31. Thank You mohamed.yakout@bibalex.org

More Related