1 / 2

A TaLISMAN: Automatic Text and LIne Segmentation of historical MANuscripts

A TaLISMAN: Automatic Text and LIne Segmentation of historical MANuscripts. Ruggero Pintus 1 , Ying Yang 2 , Enrico Gobbetti 1 and Holly Rushmeier 2 1 CRS4 2 Yale University ruggero.pintus@crs4.it , ying.yang.yy368@yale.edu , enrico.gobbetti@crs4.it , holly.rushmeier@yale.edu.

Télécharger la présentation

A TaLISMAN: Automatic Text and LIne Segmentation of historical MANuscripts

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A TaLISMAN: Automatic Text and LIne Segmentation of historical MANuscripts Ruggero Pintus1, Ying Yang2, Enrico Gobbetti1 and Holly Rushmeier2 1CRS4 2Yale University ruggero.pintus@crs4.it,ying.yang.yy368@yale.edu, enrico.gobbetti@crs4.it, holly.rushmeier@yale.edu

  2. Given a book, we extract per-page text leadings and features. We select the most salient pages and image descriptors, and we compute a rough text segmentation that we use to train a SVM classifier. We re-launch the prediction to all original features to obtain a fine segmentation. We convert these sparse text positions into a dense text region representation, and we finally extract text blocks and lines. Evaluated on a heterogeneous corpus content: ~3K pages, ~4K blocks, ~66K lines Robust to: - Different writing styles - High layout variability - One, two or more columns, marginalia, calendars - Presence of capital letters, portraits, ornamental bands, graphical contents - Aging – holes, spots, ink bleed-through, fading, missing parts, damages Text lines Original Text regions Text blocks Titel

More Related