290 likes | 445 Vues
SRW/U for DSpace. Ralph LeVan Research Scientist. What is SRW/U. A Pair of HTTP-based Text Query Protocols SRW: Search and Retrieve Web Service SRU: Search and Retrieve URL Service An alternative to Z39.50. The Weaknesses of Classic Z39.50. Not popular with the Web community
E N D
SRW/U for DSpace Ralph LeVan Research Scientist
What is SRW/U • A Pair of HTTP-based Text Query Protocols • SRW: Search and Retrieve Web Service • SRU: Search and Retrieve URL Service • An alternative to Z39.50
The Weaknesses of Classic Z39.50 • Not popular with the Web community • Connection-based Sessions • Binary Encoding • Transmitted directly over TCP/IP • Complicated
The Strengths of Classic Z39.50 • Result Sets (a.k.a. Statefulness) • Abstraction • Abstract Access Points (Attribute Sets) • Abstract Record Schemas • Explain
SRW: Search and Retrieve Web Service • SOAP (Simple Object Access Protocol) Based • HTTP • XML • Records Described in WSDL (Web Service Description Language) • 3 Services: SearchRetrieve, Scan and Explain
SRW: The Basics • Only one database per request • String (not structure) based queries • Index Sets, not Attribute Sets • One Record Syntax (XML)
The Explain Request • An empty request • E.g. http://alcme.oclc.org/srw/search/SOAR
The Explain Response • A description of the database • A list of the supported indexes • A list of the supported record schemas
The SearchRetrieve Request • String CQL Query • Integer StartRecord • Integer MaximumRecords • String RecordSchema http://alcme.oclc.org/srw/search/SOAR?query=dog
The SearchRetrieve Response • ResultSetReference • String resultSetName • Integer resultSetTimeToLive • Integer numberOfRecords • Records • Diagnostics
CQL: Common Query Language • Loosely based on CCL Search • Boolean & Proximity Operators • Index Sets & Indexes • Truncation Characters ‘*’, ‘#’ & ‘?’ • Example: dc.title=“harry potter” or bib1.isbn=123-456-78x
The Scan Request • String CQL scanClause • Integer maximumTerms • Integer responsePosition http://alcme.oclc.org/srw/search/SOAR?operation=scan&scanClause=dog&maximumTerms=3&responsePosition=3
The Scan Response • Terms • A term for searching • Possibly a term for displaying • The number of records retrieved by the term • Diagnostics
Using SRU • Send the URL and get the response BufferedReader in = new BufferedReader( new InputStreamReader(new URL(“http://alcme.oclc.org/srw/SOAR?query=dog”) .openStream())); String inputLine=null, response; StringBuffer content=new StringBuffer(); while((inputLine=in.readLine())!=null) content.append(inputLine); response=content.toString();
Using SRU • Parse the response using String methods int i=response.indexOf(“<numberOfRecords>”, j=response.indexOf(“</numberOfRecords>”), count=Integer.parseInt(response.substring(i+17, j);
Using SRU • Parse the response using DOM classes DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); DocumentBuilder builder = factory.newDocumentBuilder(); Document document = builder.parse(new InputSource(new StringReader(record)));
Using SRW • Get WSDL from server or LOC http://alcme.oclc.org/srw/search/SOAR?wsdl or http://www.loc.gov/z3950/agency/zing/srw/srw-sample-service.wsdl
Using SRW • Convert WSDL to code java org.apache.axis.wsdl.WSDL2Java --server-side --skeletonDeploy true srw-sample-service.wsdl
Using SRW • Write Client SRWSampleServiceLocator service=new SRWSampleServiceLocator(); URL url=new URL("http://alcme.oclc.org/srw/search/SOAR"); SRWPort port=service.getSRW(url); SearchRetrieveRequestType request=new SearchRetrieveRequestType(); request.setQuery(“dog"); SearchRetrieveResponseType response= port.searchRetrieveOperation(request); int postings=response.getNumberOfRecords());
DSpace Implementation • Reads list of Lucene indexes from SRWDatabase.props • Converts CQL queries to Lucene queries • Gets Dublin Core record from database
Installation • Get the SRW.war file from http://www.oclc.org/research/software/srw • Start tomcat (to unpack the .war file) • Edit the SRWServer.props configuration file • Copy the SRWDatabase.props file to your DSpace/config directory • Restart tomcat • http://yourserver/SRW/search/DSpace
SRWServer.props # parameters for the SRW Servlet SRW.Home=d:/Apache Tomcat 4.1/webapps/SRW/ default.database=DSpace resultSetIdleTime=300 db.DSpace.class=ORG.oclc.os.SRW.SRWLuceneDatabase db.DSpace.home=d:/dspace/dspace-1.1/ db.DSpace.configuration=config/SRWDatabase.props
Examples • http://alcme.oclc.org/srw/search/GSAFD? • http://alcme.oclc.org/srw/search/SOAR? • http://alcme.oclc.org/srw/search/NDL?
Links • http://www.loc.gov/srw • http://www.loc.gov/z3950/srutest.html • http://www.oclc.org/research/software/srw • http://staff.oclc.org/~levan/docs/SRWforDSpace.ppt
& Questions nswers A