1 / 14

AGORA 2007 : Adaptative software for layout analysis of document images

AGORA 2007 : Adaptative software for layout analysis of document images. Contact : JY Ramel ( ramel@univ-tours.fr ) N. Journet ( journet@univ-tours.fr ). Context of this work. With the CESR during BVH project During French national research projects

curry
Télécharger la présentation

AGORA 2007 : Adaptative software for layout analysis of document images

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. AGORA2007 : Adaptative software for layout analysis of document images Contact : JY Ramel (ramel@univ-tours.fr) N. Journet (journet@univ-tours.fr)

  2. Context of this work • With the CESR during BVH project • During French national research projects • Construction of automatic indexation tools • Adapted to degraded documents • For segmentation Text/Graphic separation • For automatic Information retrieval • For Text transcription  Automatic meta-data extraction and indexation

  3. Meta-data production • Manually • Author, year, editor… • Semantical description of the graphical parts  iconclass • Specific meta-data for experts • Keywords CESR website • Automatically with AGORA • Positions of EoC: dropcaps, portraits, paragraphs, … • Transcription of the text parts • Specific information about EoC • EoC = Element of Content AGORA software

  4. First step: EoC separation Text Text Noise Image 3 types of EoC Text Image Text

  5. Second step : Recognition of EoC Title margin Noise Caption Labels Text Dropcap Text

  6. AGORA : Interactive extraction of EoC User gives information About the size of characters Proposition of text/graphics segmentation About the size of graphics About spaces between letters, between words, …

  7. AGORA : Interactive recognition of EoC How to associate a label to each EOC ? Impossible to foresee all the user needs AGORA needs the user to learn how to recognize the desired EoC AGORA philosophy A user show examples of EoC  interactive construction of scenarios of recognition A scenario is a set of extraction rules A manual modification of scenario is still possible 7

  8. Global methodology Automatic insertion of the rules in the scenario Previsualization Exemple selection As many exemples as necessary 4 exemples selected in 3 images Obtained Results Manual modification of the rules

  9. Automatic creation of recognition rules Distance from the top Distance from the left avg = 0,46 std = 0,41 avg = 0,51 std = 0,07

  10. Manual modification of the rules ERROR 10

  11. AGORA outputs Set of EdC (here EOC = TEXT) 1 XML file for 1 EOC Information about the position and the orignal image corresponding to the block Information about the lines in a text block Information about the words in a line Informations about the characters in the words

  12. AGORA outputs Extracted label Extracted images

  13. Strengthness of AGORA2007 An Assistant drives the user through the different steps Processing of complete works automatically (using a scenario) Constant visualisation of processing results Easy to use (no specific knowledge is necessary)

  14. AGORA in action with Nicholas

More Related