Download
cadal digital library n.
Skip this Video
Loading SlideShow in 5 Seconds..
CADAL Digital Library PowerPoint Presentation
Download Presentation
CADAL Digital Library

CADAL Digital Library

236 Vues Download Presentation
Télécharger la présentation

CADAL Digital Library

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. The 2nd International Conferenceon Universal Digital Library(ICUDL 2006) CADAL Digital Library Wu Jiang-Qin,Zhuang Yue-Ting Pan Yun-he College of Computer Science, Zhejiang University,China November 18,2006  

  2. Outline Introduction 1 2 Unified Paralleling Search Multimedia Analysis and Retrieval 3 Bilingual services 4 5 Chinese Calligraphy Character Retrieval 6 Conclusion and Future Work

  3. Outline Introduction 1 2 Unified Paralleling Search Multimedia Analysis and Retrieval 3 Bilingual services 4 5 Chinese Calligraphy Character Retrieval 6 Conclusion and Future Work

  4. CADAL • The China-Us Million Book Digital Library(CADAL) is an international cooperation program between China and the US. • The objective of CADAL project , is to create a free-to-read, searchable collection of one million book, available to everyone over the internet. • CADAL is the important part of Universal Digital Library(UDL), universal access to human knowledge.

  5. The challenges and services (1) • the amount of the digital resources including digital books and multimedia for research and education can reach 100 terabyte(The number of digital books is 1,023,425 by October of 2006,including previous Chinese ancient books, Chinese minguo books ,Chinese Modern books, Chinese degree dissertation,English books,image,video etc.. • active services of unified paralleling search for the different types of digital resources

  6. The challenges and services (2) • image, video,3-D model and other types of media resources, various types of media resources are included in the CADAL resources. • the services of quickly retrieving and structurally browsing of multimedia documents including image, video

  7. The challenges and services (3) • there are two kinds of language digital books. Chinese and English, in the CADAL resources. • the services of bilingual translation

  8. The challenges and services (4) • traditional Chinese culture resources are important part of the CADAL resources. • the services related to Chinese traditional culture resources.

  9. Outline Introduction 1 2 Unified Paralleling Search Multimedia Analysis and Retrieval 3 Bilingual services 4 5 Chinese Calligraphy Character Retrieval 6 Conclusion and Future Work

  10. Background • TB volume of various types of digital resources, such as dissertation, ancientminguo book, modern book, minguo journal, English book, drawing, video and illustration are available in the CADAL, which is one of the distinct characteristic of CADAL. So CADAL presents a challenge for the technique of searching resources based on metadata.

  11. Metadata • Dublin core metadata is used to describe the million digital books in the CADAL project. Metadata corresponding to the other types of multimedia resources are used to describe them. Independent data map is designed for each kind of resource metadata.

  12. Unified parallel searching • In order to meet the requirements of different users and improve the user’s interactive experience, the service for the different types of digital resources is provided for users’ convenient searching.

  13. Outline Introduction 1 2 Unified Paralleling Search Multimedia Analysis and Retrieval 3 Bilingual services 4 5 Chinese Calligraphy Character Retrieval 6 Conclusion and Future Work

  14. Background • As the digital library contains unstructured multimedia resources such as images, videos, audios etc besides digital books, effective and efficient analysis and retrieval of multimedia resources is a challenging problem in the CADAL digital library. • Here we examine the analysis and retrieval issues related to two primary kinds of multimedia, image and video.

  15. Contents • Content-based Image Retrieval • Image retrieval by peer indexing • Image annotation • Image search engine • Video analysis system • Video Browser(structure and summary) • Metadata-based Video Retrieval

  16. Content based image retrieval • Extracting visual features • color feature:color histogram, color moment, color coherence vector, color correlgram • texture:Tamura textural feature and co-occurrence textural feature • relevance feedback • Make image retrieval coincide with user’s requirement

  17. Content based image retrieval Query example Negative example Relevance feedback Image searching Positive example

  18. Image retrieval by peer index • A new scheme for image indexing, Peer Index, is the method that describe images through semantically relevant peer images. • In particular, each image is associated with a two-level peer index, including • global peer index: describing the “data characteristics” of this image • personal peer indexes: describing the “user characteristics” of an individual user with respect to this specific image • Both types of peer index are learned interactively and incrementally from user feedback information.

  19. Peerindex-based image retrieval semantic relevance feedback Semantic query

  20. Image annotation • Automatic semantic annotation for images by machine learning and statistical modeling • Classify the training images, and create a semantic skeleton for each class of the training image. • Classify new image with Support Vector Machine automatically, and describe it using the semantic skeleton • Select the key words for the image by statistical methods

  21. Image annotation Images images Image blobs images ............ classify statistical learning Semantic skeleton annotation tiger annotate Visual similar classify segment

  22. Query text:bonsai Text based image retrieval

  23. Image search engine • We implemented an image search engine, Octopus, which provides Peer Index and relevance feedback to avoid the gap between the semantics and low-level features, according to the intuitive and simple idea that the semantic concept is hidden in each image and the semantic concept appears apparently in the relation between the image and the other images.

  24. WWW scanner Image Manager store CADAL portal Browse Retrieve Relevance feedback Books CADAL image repository Image retrieval system images images Other images user images metadata feature Integrating into CADAL DL

  25. The image retrieval interface

  26. Our target for video • Analyze multimodal information, such as the visual, the audial, motion and caption to generatestructural informationandvideo summary • Supportvideo browsing and video retrievalbased on metadata and structural information efficiently

  27. Main idea • Nonlinear browsing:Generate structural indexing such as key frame, shot and shot group from the original video stream • Content compression:Analyze time sequence in video stream, eliminate redundant data, and generate thesummary and the highlightscene for the original video.

  28. CADAL portal Video fusion analysis system Video browser Video data user CADAL video repository video repository Metadata database Feature database System • Video Fusion Analysis System(VideoFAS) • VideoBrowser

  29. VideoFAS-system interface Original Video Similar Video shots are Clustered together Video shot

  30. VideoFAS-system functions • Basic operation • Importing and Saving • Appending • Separating the video stream into video and audio data • transcoding and compressing

  31. VideoFAS-system functions • Feature Extraction • Visual feature • color:color histogram, color moment, color coherence vector, color correlgram • Texture:Tamura textural feature and co-occurrence textural feature • shape:contour feature • Audial feature • temporal feature:zero-crossing rate • Frequency feature:Mel coefficient、tone and sub-band statistical feature • Target Feature • Integrate OpenCV face detection module into the system Extract the face features

  32. VideoFAS-system functions • Video structuring • shot detection • Cut shot detection • Transition shot detection • key frame extraction • Similar shot grouping • group the shots based on Support Vector Machine

  33. Original Video Shot Sequence Video Shot Clustering Video Shot Cluster A Video Shot Cluster B Video Shot Cluster C Video Shot Cluster D Video Shot Cluster E

  34. Original Video Shot Sequence A B C D A C D E A B C D A B C D VideoFAS-system functions • Video summarization • Summarize by Mining Non-Trivial Repeating Patterns • Extract frequent and non-trivial shot sequence to generate video summary

  35. VideoFAS-system functions • Metadata annotation • Annotate Video clip with metadata conform to Dublin Core Standard • Save the metadata and the video structural information in database

  36. VideoBrowser-framework Original video data Video repository WWW Video catalog user Video summary Content service

  37. VideoBrowser-system interface

  38. VideoBrowser-system interface metadata media player Video structural information

  39. System architecture Web Movies Internet Web server Retrieval service video data firewall switcher Online storage Disk array Archive server annotation structuring summarization taper(offline storage)

  40. Outline Introduction 1 2 Unified Paralleling Search Multimedia Analysis and Retrieval 3 Bilingual services 4 5 Chinese Calligraphy Character Retrieval 6 Conclusion and Future Work

  41. Background • As there are both English and Chinese books in CADAL, bilingual services are required for users to access resources in any language.

  42. Services • Some technologies and prototypes have been developed by north technical center on how to carry out the multi-layered bilingual machine translation in English and Chinese books, such as • the metadata translation between English and Chinese • the accurate translation of proper nouns such as names for unique individuals, events,or places • the selective translation in a full-text context • the translation of Old Chinese text • the distributed translation service technique.

  43. Services • An online translation service is integrated into the CADAL digital library. • Users can be directly conducted semantic-based multi-linguistics retrieval of required information in our CADAL digital library. • The translation of contents of a page on line. • The translation of metadata of a digital book.

  44. Bilingual Search

  45. The translation of contents of a page

  46. Outline Introduction 1 2 Unified Paralleling Search Multimedia Analysis and Retrieval 3 Bilingual services 4 5 Chinese Calligraphy Character Retrieval 6 Conclusion and Future Work

  47. Background • Since most people are interested in the art of the beautiful styles of calligraphy character rather than the meaning of the character, the service of Chinese calligraphy character retrieval is provided in the CADAL digital library, treating them just as they are images without recognizing them like OCR does.