170 likes | 259 Vues
Unimpeded Discovery of Digital Content - Intro -. G ünter Waibel /RLG. Recognition over Recall. “Recognition occurs when you see something familiar, while recall requires that you remember something and are able to articulate it.”
E N D
Unimpeded Discovery of Digital Content- Intro - Günter Waibel/RLG
Recognition over Recall • “Recognition occurs when you see something familiar, while recall requires that you remember something and are able to articulate it.” • “Most information retrieval depends upon recall skills – the user has to describe what he or she wishes to retrieve.” • “Recognition approaches are likely to be much more effective in large digital libraries of the future […]” - Borgman, Christine L. Personal digital libraries: Creating individual spaces for innovation.http://www.sis.pitt.edu/~dlwkshop/paper_borgman.pdf
Blaise Agüera y Arcas President and CTOSandCodex LLC
Overview • SeaDragon is a client/server (and, potentially, peer-to-peer) technology. • Highly generalized environment for viewing and interacting with visual objects of all kinds—images, texts, composites, applets. • It can be thought of as a windowing system or OS front end. • Client-side deployment follows “free player model”, like Acrobat, but uses no proprietary file formats or protocols. • Special emphasis on JPEG2000.
JPEG2000 • Evolutionary vs. revolutionary use of this new standard. • Evolution: better compression, fewer artifacts, etc. • Aside from metadata (revolution on server side), critical new facilities for client-side revolution: • Decoding at different resolutions, not reading order • Random access • Quality layers, potentially up to lossless • Ergo, ONE FILE (metadata, thumbnails, lossless/archival quality, internet-viewable, etc.)
JPEG2000 • Flexibility of format makes best-practices guidelines important. • Examples: • Quality layers, e.g. 0.2, 0.5, 1, 2, - bits per pixel • Packet order, e.g. RLCP vs. LRCP • Relax: wrong packet order, etc. can be fixed easily by batch-transcoding. • The only thing that can’t be fixed later is loss of data. If scanner isn’t native JPEG2000, keep around TIFFs on tape to ensure no oversight in metadata, even if lossless. Space on tape is cheap.
Server-side • Expect to implement a yearly license model for running our server • An important component for imaging is the JPIP protocol, part of JPEG2000 standard • Although protocol is standard, extra metadata packets and imaging data exchanged specific to our architecture • None of the primary content on the server side needs to be in proprietary or non-standard formats (unlike MrSID)
Deployment and customization • Server side • “Views” created as JPX files (a JPEG2000 file format) • Authoring tool— • Enhanced version of client • Indexing of JPEG2000 imagery and conversion of content in other formats • Heterogeneous layout, look and feel • Analogous to HTML • The investment in making these views is the only non-portable investment required to deploy SeaDragon. • “Glue code” to process messages from client and
Optional customization • Dynamic view creation using scripts • Server side “glue code” to, e.g., translate messages from client into SQL queries and create dynamic views based on the query results • Client side customization (Java or Python) to support UI behavior beyond our standard GUI • (security issues)
Caching • On a notebook computer, < 0.1 sec/image to form server-side cache structure • This makes server-side dynamic creation of views practical (though not quite yet for novel views of 15,000 images) • New large composite views, however, are possible to build very quickly from other views
The world in a grain of sand • Objects needn’t be single JPEG2000 images— • Texts • Other views • Applets…
Where can I see more? • We will soon be on the web at www.sandcodex.com • Email me to be put on our mailing list: blaise@sandcodex.com