1 / 70

Data-Driven Digital Library Applications -- The UC Berkeley Environmental Digital Library

Data-Driven Digital Library Applications -- The UC Berkeley Environmental Digital Library. University of California, Berkeley School of Information Management and Systems SIMS 257: Database Management. Final Project Review ORDBMS Feature JDBC Access to DBMS

mariska
Télécharger la présentation

Data-Driven Digital Library Applications -- The UC Berkeley Environmental Digital Library

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data-Driven Digital Library Applications -- The UC Berkeley Environmental Digital Library University of California, Berkeley School of Information Management and Systems SIMS 257: Database Management

  2. Final Project Review ORDBMS Feature JDBC Access to DBMS Data-Driven Digital Library Applications Berkeley’s Environmental Digital Library Lecture Outline

  3. Final Project Review ORDBMS Feature JDBC Access to DBMS Data-Driven Digital Library Applications Berkeley’s Environmental Digital Library Lecture Outline

  4. Final Project Requirements • See WWW site: • http://sims.berkeley.edu/courses/is257/f04/index.html • Report on personal/group database including: • Database description and purpose • Data Dictionary • Relationships Diagram • Sample queries and results (Web or Access tools) • Sample forms (Web or Access tools) • Sample reports (Web or Access tools) • Application Screens (Web or Access tools)

  5. Final Presentations and Reports • Specifications for final report are on the Web Site under assignments • Reports Due on December 15. • Presentations on December 15, 9:00-12:00

  6. Final Project Review ORDBMS Feature JDBC Access to DBMS Data-Driven Digital Library Applications Berkeley’s Environmental Digital Library Lecture Outline

  7. Object Relational Data Model • Class, instance, attribute, method, and integrity constraints • OID per instance • Encapsulation • Multiple inheritance hierarchy of classes • Class references via OID object references • Set-Valued attributes • Abstract Data Types

  8. PostgreSQL • All of the usual SQL commands for creation, searching and modifying classes (tables) are available. With some additions… • Inheritance • Non-Atomic Values • User defined functions and operators

  9. Inheritance CREATE TABLE cities ( name text, population float, altitude int -- (in ft) ); CREATE TABLE capitals ( state char(2) ) INHERITS (cities);

  10. Non-Atomic Values - Arrays • Postgres allows attributes of an instance to be defined as fixed-length or variable-length multi-dimensional arrays. Arrays of any base type or user-defined type can be created. To illustrate their use, we first create a class with arrays of base types. CREATE TABLE SAL_EMP ( name text, pay_by_quarter int4[], schedule text[][] );

  11. PostgreSQL Extensibility • Postgres is extensible because its operation is catalog-driven • RDBMS store information about databases, tables, columns, etc., in what are commonly known as system catalogs. (Some systems call this the data dictionary). • One key difference between Postgres and standard RDBMS is that Postgres stores much more information in its catalogs • not only information about tables and columns, but also information about its types, functions, access methods, etc. • These classes can be modified by the user, and since Postgres bases its internal operation on these classes, this means that Postgres can be extended by users • By comparison, conventional database systems can only be extended by changing hardcoded procedures within the DBMS or by loading modules specially-written by the DBMS vendor.

  12. User Defined Functions • CREATE FUNCTION allows a Postgres user to register a function with a database. Subsequently, this user is considered the owner of the function CREATE FUNCTION name ( [ ftype [, ...] ] ) RETURNS rtype AS {SQLdefinition} LANGUAGE 'langname' [ WITH ( attribute [, ...] ) ] CREATE FUNCTION name ( [ ftype [, ...] ] ) RETURNS rtype AS obj_file , link_symbol LANGUAGE 'C' [ WITH ( attribute [, ...] ) ]

  13. External Functions • This example creates a C function by calling a routine from a user-created shared library. This particular routine calculates a check digit and returns TRUE if the check digit in the function parameters is correct. It is intended for use in a CHECK contraint. CREATE FUNCTION ean_checkdigit(bpchar, bpchar) RETURNS bool AS '/usr1/proj/bray/sql/funcs.so' LANGUAGE 'c'; CREATE TABLE product ( id char(8) PRIMARY KEY, eanprefix char(8) CHECK (eanprefix ~ '[0-9]{2} [0-9]{5}') REFERENCES brandname(ean_prefix), eancode char(6) CHECK (eancode ~ '[0-9]{6}'), CONSTRAINT ean CHECK (ean_checkdigit(eanprefix, eancode)));

  14. Creating new Types • CREATE TYPE allows the user to register a new user data type with Postgres for use in the current data base. The user who defines a type becomes its owner. typename is the name of the new type and must be unique within the types defined for this database. CREATE TYPE typename ( INPUT = input_function, OUTPUT = output_function , INTERNALLENGTH = { internallength | VARIABLE } [ , EXTERNALLENGTH = { externallength | VARIABLE } ] [ , DEFAULT = "default" ] [ , ELEMENT = element ] [ , DELIMITER = delimiter ] [ , SEND = send_function ] [ , RECEIVE = receive_function ] [ , PASSEDBYVALUE ] )

  15. Rules System • CREATE RULE name AS ON event TO object [ WHERE condition ] DO [ INSTEAD ] [ action | NOTHING ] • Rules can be triggered by any event (select, update, delete, etc.)

  16. Views as Rules • Views in Postgres are implemented using the rule system. In fact there is absolutely no difference between a CREATE VIEW myview AS SELECT * FROM mytab; • compared against the two commands CREATE TABLE myview (same attribute list as for mytab); CREATE RULE "_RETmyview" AS ON SELECT TO myview DO INSTEAD SELECT * FROM mytab;

  17. GiST Approach • A generalized search tree. Must be: • Extensible in terms of queries • General (B+-tree, R-tree, etc.) • Easy to extend • Efficient (match specialized trees) • Highly concurrent, recoverable, etc.

  18. Java and JDBC • Java is probably the high-level language used in most software development today one of the earliest “enterprise” additions to Java was JDBC • JDBC is an API that provides a mid-level access to DBMS from Java applications • Intended to be an open cross-platform standard for database access in Java • Similar in intent to Microsoft’s ODBC

  19. JDBC Resultset Resultset Resultset Statement PreparedStatement CallableStatement Connection Application DriverManager Oracle Driver ODBC Driver Postgres Driver Oracle DB ODBC DB Postgres DB • Provides a standard set of interfaces for any DBMS with a JDBC driver – using SQL to specify the databases operations.

  20. JDBC Simple Java Implementation import java.sql.*; import oracle.jdbc.*; public class JDBCSample { public static void main(java.lang.String[] args) { try { // this is where the driver is loaded //Class.forName("jdbc.oracle.thin"); DriverManager.registerDriver(new OracleDriver()); } catch (SQLException e) { System.out.println("Unable to load driver Class"); return; }

  21. JDBC Simple Java Impl. try { //All DB access is within the try/catch block... // make a connection to ORACLE on Dream Connection con = DriverManager.getConnection( "jdbc:oracle:thin:@dream.sims.berkeley.edu:1521:dev", “mylogin", “myoraclePW"); // Do an SQL statement... Statement stmt = con.createStatement(); ResultSet rs = stmt.executeQuery("SELECT NAME FROM DIVECUST");

  22. JDBC Simple Java Impl. // show the Results... while(rs.next()) { System.out.println(rs.getString("NAME")); } // Release the database resources... rs.close(); stmt.close(); con.close(); } catch (SQLException se) { // inform user of errors... System.out.println("SQL Exception: " + se.getMessage()); se.printStackTrace(System.out); } } }

  23. Final Project Review ORDBMS Feature JDBC Access to DBMS Data-Driven Digital Library Applications Berkeley’s Environmental Digital Library Lecture Outline

  24. Berkeley DL Project • Object Relational Database Applications • The Berkeley Digital Library Project • Slides from RRL and Robert Wilensky, EECS • Use of DBMS in DL project

  25. Overview • What is an Digital Library? • Overview of Ongoing Research on Information Access in Digital Libraries

  26. Digital Libraries Are Like Traditional Libraries... • Involve large repositories of information (storage, preservation, and access) • Provide information organization and retrieval facilities (categorization, indexing) • Provide access for communities of users (communities may be as large as the general public or small as the employees of a particular organization)

  27. Traditional Library System Originators Libraries Users

  28. But Digital Libraries Are Different From Libraries... • Not a physical location with local copies; objects held closer to originators • Decoupling of storage, organization, access • Enhanced Authoring (origination, annotation, support for work groups) • Subscription, pay-per-view supported in addition to “free” browsing. • Integration into user tasks.

  29. A Digital Library Infrastructure Model Originators Index Services Repositories Network Users

  30. UC Berkeley Digital Library Project • Focus: Work-centered digital information services • Testbed: Digital Library for the California Environment • Research: Technical agenda supporting user-oriented access to large distributed collections of diverse data types. • Part of the NSF/NASA/DARPA Digital Library Initiative (Phases 1 and 2)

  31. UCB Digital Library Project: Research Organizations • UC Berkeley EECS, SIMS, CED, IS&T • UCOP/CDL • Xerox PARC’s Document Image Decoding group and Work Practices group • Hewlett-Packard • NEC • SUN Microsystems • IBM Almaden • Microsoft • Ricoh California Research • Philips Research

  32. Testbed: An Environmental Digital Library • Collection: Diverse material relevant to California’s key habitats. • Users: A consortium of state agencies, development corporations, private corporations, regional government alliances, educational institutions, and libraries. • Potential: Impact on state-wide environmental system (CERES )

  33. The Environmental Library -Users/Contributors • California Resources Agency, California Environment Resources Evaluation System (CERES) • California Department of Water Resources • The California Department of Fish & Game • SANDAG • UC Water Resources Center Archives • New Partners: CDL and SDSC

  34. The Environmental Library - Contents • Environmental technical reports, bulletins, etc. • County general plans • Aerial and ground photography • USGS topographic maps • Land use and other special purpose maps • Sensor data • “Derived” information • Collection data bases for the classification and distribution of the California biota (e.g., SMASCH) • Supporting 3-D, economic, traffic, etc. models • Videos collected by the California Resources Agency

  35. The Environmental Library - Contents • As of late 2002, the collection represents over one terabyte of data, including over 183,000 digital images, about 300,000 pages of environmental documents, and over 2 million records in geographical and botanical databases.

  36. Botanical Data: • The CalFlora Database contains taxonomical and distribution information for more than 8000 native California plants. The Occurrence Database includes over 600,000 records of California plant sightings from many federal, state, and private sources. The botanical databases are linked to the CalPhotos collection of California plants, and are also linked to external collections of data, maps, and photos.

  37. Geographical Data: • Much of the geographical data in the collection has been used to develop our web-based GIS Viewer. The Street Finder uses 500,000 Tiger records of S.F. Bay Area streets along with the 70,000-records from the USGS GNIS database. California Dams is a database of information about the 1395 dams under state jurisdiction. An additional 11 GB of geographical data represents maps and imagery that have been processed for inclusion as layers in our GIS Viewer. This includes Digital Ortho Quads and DRG maps for the S.F. Bay Area.

  38. Documents: • Most of the 300,000 pages of digital documents are environmental reports and plans that were provided by California state agencies. This collection includes documents, maps, articles, and reports on the California environment including Environmental Impact Reports (EIRs), educational pamphlets, water usage bulletins, and county plans. Documents in this collection come from the California Department of Water Resources (DWR), California Department of Fish and Game (DFG), San Diego Association of Governments (SANDAG), and many other agencies. Among the most frequently accessed documents are County General Plans for every California county and a survey of 125 Sacramento Delta fish species.

  39. Testbed Success Stories • LUPIN: CERES’ Land Use Planning Information Network • California Country General Plans and other environmental documents. • Enter at Resources Agency Server, documents stored at and retrieved from UCB DLIB server. • California flood relief efforts • High demand for some data sets only available on our server (created by document recognition). • CalFlora: Creation and interoperation of repositories pertaining to plant biology. • Cloning of services at Cal State Library, FBI

  40. Research Highlights • Documents • Multivalent Document prototype • Page images, structured documents, GIS data, photographs • Intelligent Access to Content • Document recognition • Vision-based Image Retrieval: stuff, thing, scene retrieval • Natural Language Processing: categorizing the web, Cheshire II, TileBar Interfaces

  41. Multivalent Documents • MVD Model • radically distributed, open, extensible • “behaviors” and “layers” • behaviors conform to a protocol suite • inter-operation via “IDEG” • Applied to “enlivening legacy documents” • various nice behaviors, e.g., lenses

  42. Document Presentation • Problem: Digital libraries must deliver digital documents -- but in what form? • Different forms have advantages for particular purposes • Retrieval • Reuse • Content Analysis • Storage and archiving • Combining forms (Multivalent documents)

  43. Spectrum of Digital Document Representations AdaptedfromFox, E.A., etal. “Users, User Interfaces and Objects: Evision, an Electronic Library”, JASIS 44(8), 1993

  44. Document Representation: Multivalent Documents • Primary user interface/document model for UCB Digital Library (Wilensky & Phelps) • Goal: An approach to new document representations and their authoring. • Supports active, distributed, composable transformations of multimedia documents. • Enables sophisticated annotations, intelligent result handling, user-modifiable interface, composite documents.

  45. Multivalent Documents Network Protocols & Resources Cheshire Layer GIS Layer Table Layer OCR Layer OCR Mapping Layer Valence: 2: The relative capacity to unite, react, or interact (as with antigens or a biological substrate). Webster’s 7th Collegiate Dictionary History of The Classical World kdk dkd kdk Modernjsfj sjjhfjs jsjj jsjhfsjf sslfjksh sshf jsfksfjk sjs jsjfs kj sjfkjsfhskjf sjfhjksh skjfhkjshfjksh jsfhkjshfjkskjfhsfh skjfksjflksjflksjflksf sjfksjfkjskfjskfjklsslk slfjlskfjklsfklkkkdsj The jsfj sjjhfjs jsjj jsjhfsjf sjhfjksh sshf jsfksfjk sjs jsjfs kj sjfkjsfhskjf sjfhjksh skjfhkjshfjksh jsfhkjshfjkskjfhsfh skjfksjflksjflksjflksf sjfksjfkjskfjskfjklsslk slfjlskfjklsfklkkkdsj ksfksjfkskflk sjfjksf kjsfkjsfkjshf sjfsjfjks ksfjksfjksjfkthsjir\\ ks ksfjksjfkksjkls’ks klsjfkskfksjjjhsjhuu sfsjfkjs Scanned Page Image taksksh kdjjdkd kdjkdjkd kj sksksk kdkdk kdkd dkk skksksk jdjjdj clclc ldldl Table 1.

  46. MVD availability • The MVD Browser is now available as open source on SourceForge • http://multivalent.sourceforge.net • See also: • http://elib.cs.berkeley.edu

  47. GIS in the MVD Framework • Layers are georeferenced data sets. • Behaviors are • display semi-transparently • pan • zoom • issue query • display context • “spatial hyperlinks” • annotations • Written in Java

  48. GIS Viewer: Features • Annotation and saving • points, rectangles (w. labels and links), vectors • saving of annotations as separate layer • Integration with address, street finding, gazetteer services • Application to image viewing: tilePix • Castanet client

More Related