1 / 23

Semantics for Content Indexing in Digital Libraries

This study discusses the use of semantics in organizing and accessing content in digital libraries, focusing on educational applications and the modeling of the digital library organization. The study also explores personalized frameworks and the decomposition of digital objects.

pedroa
Télécharger la présentation

Semantics for Content Indexing in Digital Libraries

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. On The Use of Semantics for Content Indexing in Digital Libraries Amarnath Gupta University of California San Diego

  2. The Context • NSF sponsored NSDL project funds a number of institutions to build digital libraries and technologies related to DLs for applications in scientific content • DLESE is an NSDL effort specifically for creating a DL for education in science and engineering • SDSC’s task • To develop and deliver technology that makes scientific digital libraries more organized and accessible to users in education

  3. An Application • Consider an educational application • Material: digitally born objects • Users • A teacher • A curriculum developer • An author of an electronic textbook • Contributor of digital information to the library • Course and textbook reviewers

  4. Our Approach • Model the application • Develop means to access and update the model • Model the digital library organization • Develop means to access and update the model • Model the method to “connect” them Modeled Application Framework Personalization Framework Modeled DL Engine Modeled DL DL

  5. AAAS Science Atlas Ecosystems & Biological Evolution Within cells, many of the basic functions of organisms – such as extracting energy from food and getting rid of waste—are carried out. The way in which cells function is similar in all living organisms.

  6. Modeling the Application • The AAAS strand diagrams form a set of directed acyclic graphs • Inter-DAG joins are being implemented • We have developed • DQL, a preliminary SQL-like query language for DAGs • An evaluation engine for DAGs for DAG-pattern queries • A Patricia-tree based search structure for text searching • In Progress • Launching queries initiated from the AAAS system to the DLs

  7. The Mindset for DLs • A Digital Library is a heterogeneous collection of digital objects • Consider a web-accessible digital library of digitally born scientific educational material containing • Web pages having educational content • PowerPoint slides • Images, animations, videos • Electronic textbooks • … • Each digital object comes with a set of standard metadata and some domain-specific metadata

  8. The Wish List and the Problem • If we would like to • enable individual authors to contribute their material to a digital library • serve schemaless search requests like • “Find animations of the deletion algorithm in B-trees” • “Find some content illustrating the history of the early use of microscopy in biology” • enable individuals to annotate existing material and “publish” these annotations • What might be a reasonable model to organize the information in a digital library? • How does an individual (or a group) construct a personal connection to a digital library?

  9. Some Principles • The organization scheme should • have a mathematical model that • represents semantics of objects • facilitate content-based search • apply to textual and media materials • allow multiple annotations of the same digital object or its components • freely combine the metadata information and the content description to enable access • The personalization scheme • Should also be structured, publishable and searchable

  10. Decomposability • A generic digitally born object can often be modeled as a “compound” entity that can be recursively decomposed into its media components, which can be further decomposed into media-specific segments

  11. anchor anchor Decomposability Title Presentation Category Title Author Keywords Slide Slide Slide Slide Title Text Sp.Sheet Image Speech type=“audio” length=3472 Object “file://speech2.wav" Row Value type=“image” width=451 height=275 Reference “http://abc.xyz.com/123.gif" Col fmla="=SUM(C5:C8)" Value

  12. Anchored Graphs • N1,…,Nk is a partition over vertices V • Node set Nihas a distinguished node nij that has a single inbound link called the anchor from another node set • There may be only one node set with no anchor • An anchored graph is a model for recursively decomposable media-enriched digital object • The set of anchors is the minimal cutset of the graph

  13. ph1 ph2 ph3 ph4 ph5 ph6 S/S Ph/an Ph/i Ph/c Ph/foh Ph/see S/pronoun S/verb W/foresee W/I W/can ph1 ph2 ph3 ph4 ph5 ph6 Ph/an Ph/i Ph/c Ph/foh Ph/see T/calm T/emphatic Modeling Annotation • Any component of a digital object may need to be annotated, perhaps according to an ontology • An annotation is a super-structure on an anchored graph

  14. Ordered Annotation Graph • AG1: For any two vertices,u and v,for some relation R, u <Rv. • AG2: There is a path from u to v, iff u <Rv. • AG3a:Two or more edges of the same type cannot originate/end at a single vertex. • AG3b: For any two vertices u and v, there can be more than one edge from u to v, but of different types. AGs are label-unithreaded. • AG4: There is no vertex of degree zero. An annotation graph is always connected. • AG5:A path of type t from v1to vncan be compressed into an edge (v1vn),with tuple (t, l1. . . ln), liis the value of edge eiin the path, if • the path has at least three vertices. • let E1nbe the set of edges incident on any of the vertices lying on this path such that uv E1n, v1Ru and v Rvn, and • let V1nbe the set of vertices on the path v1..vn, then there is no path between viand vjof type twhere t t, vi, vjV1nand viv1, vj<Rvn.

  15. Partially Ordered Annotation Graph • A segmentation of an image-like component is in general an attributed relational graph • A lexicographic DFS traversal of the graph would produce a tree • The properties of the ordered annotation graphs also hold for trees

  16. Large Scale Brain Maps • Custom high precision montaging stage • 40 X 30 image panels • 40X 1.3 oil objective • 800 Mb full resolution TIFF 1 mm

  17. From Keywords to Keygraphs • "... the N-terminal region of hFis1 is necessary for mitochondrial fission ... an increased level of cellular hFis1 strongly promotes mitochondrial fission ... hFis1 participates in mitochondrial fission through an interaction that recruits DLP1 from the cytosol ... the number of hFis1 molecules on the mitochondrial surface determines fission frequency".

  18. Keygraphs: Restructuring Textual Content A multi-digraph G=(V,E) over a digital object D where nodes V are 4-tuples(ON,C,A,m) and edges E are 3-tuples (OE,L,n) that satisfy the following: • ·C is a set of class names and A is a set of attribute names that occur in ontology ON and m: ON V is an attribution functionthatassigns attribute names from the ontology to a node • ·L is a set of class labels and n: OE (V  A)  (V  A) is a mapping function that assigns labels from the edge ontology to an edge L • ·G is label-acyclic, i.e., it for any label l in L, a subgraph containing edges with label l do not form a cycle • · node pairs V1, V2, edge l(V1.A1, V2.A2)  edge l(V1, V2) • ·for any node v or edge e in G, there is an evidence function (E, j) : V  E S, whereSis a set of segments of D, and j is a possibly null predicate over S that must hold for the evidence function

  19. Antony van Leeuwenhoek (1632-1723) "I then most always saw, with great wonder, that in the said matter there were many very little living animalcules, very prettily a-movinewg. The biggest sort. . . had a very strong and swift motion, and shot through the water (or spittle) like a pike does through the water. The second sort. . . oft-times spun round like a top. . . and these were far more in number … an unbelievably great company of living animalcules, a-swimming more nimbly than any I had ever seen up to this time. The biggest sort. . . bent their body into curves in going forwards. . . Moreover, the other animalcules were in such enormous numbers, that all the water. . . seemed to be alive. myLib: Personalizing DLs Microscopes make it possible to see that living things are made mostly of cells. • Register your annotations DL object • anecdote(obj15,discovery(cell)) • anecdote(obj15,history(microscope)) • reference(subject(obj15),hypothesis(‘cell begets cell’)) • example(Object,Concept) • Link your application structure with the DL • associate((discovery(bacteria), discovery(shape(cell(sperm))), identical(discoverer)) • qualify(association(tu19,tu97), example, [obj74,obj21]) • Can be viewed as a connector graph with a propositional labeling language – linkgraph

  20. Discovery Services • A short course for advanced students • Find a path P through concepts C1, C2, C3 so that least(node(P)) after prerequisite concepts P1, P2 • What part of the syllabus can be covered by a all objects in collection C1? • Find a minimal set of examples that can be used to cover the neighborhood of concepts C1, C2 • First find an approximate Steiner tree • Then find the examples • Minimize them by their coverage

  21. Higher-Level Services • Does this textbook have any “holes”? • Find an “alignment” of the concept graphs from the book and the SCGM • Find if any concept or TU in the book is missing a prerequisite concept node from the SCGM • Teaching children through “reasoning” • Pick two objects from a collection • Ask the children how they are related • Trace how closely the children’s trajectory between the concepts match any path between them • Make the system provide “hints” by suggesting missed nodes in the path

  22. Summing it Up • We investigate • Principles to construct a semantically oriented digital library • Graph-based models of textual and not textual content • Anchored graphs • Annotation graphs • Keygraphs • Link graphs • Algebras over theses structures • Focusing on educational material in biology at different educational levels

  23. Acknowledgement • The National Science Foundation • The National Archives and Records Agency • Reagan W. Moore • Bertram Ludäscher • Smriti Yamini • Mevlut Erdem Kurul • Tulika Agrawal • Xufei Qian

More Related