1 / 17

NEEO project EC Final review meeting Gateway and portal 23 March 2010

NEEO project EC Final review meeting Gateway and portal 23 March 2010. Benoit Pauwels Université Libre de Bruxelles, Belgium. Plan. Overview of technical infrastructure EO as a network of data providers – descriptive metadata EO as a network of data providers – usage statistics

bin
Télécharger la présentation

NEEO project EC Final review meeting Gateway and portal 23 March 2010

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. NEEO project EC Final review meeting Gateway and portal 23 March 2010 Benoit Pauwels Université Libre de Bruxelles, Belgium

  2. Plan • Overview of technical infrastructure • EO as a network of data providers – descriptive metadata • EO as a network of data providers – usage statistics • Added value services • Publication lists • Enrichedmetadata • Full-textsearching • Multilinguality • Collaboration with RePEc • EO gateway and portal

  3. Metadata Logs Objects OAI-PMH HTTP DIDL / MODS SWUP Meresco Enrichment service Harvester Crawler OAI-PMH Metadata SRU Lucene SRU RePEc OAI-PMH RSS/Atom EO portal Homemade - FOSS Exporter engine Homemade - FOSS Other portals

  4. Descriptive metadata exchange format

  5. Descriptive metadata exchange format • DIDL – XML container structure that can hold semantically distinct metadata • Descriptive, object files (by-ref), splash page, enriched metadata • Based on existing container structure defined by SurfShare • MODS(3.2) – granular descriptive metadata • Based on existing metadata structure defined by SurfShare • DAI– Unambiguous identification of authors • National or institution-unique persistent identifier • Continuous aim of standardization at a level that surpasses the NEEO project • NEEO adaptations fed back to SurfShare

  6. DIDL[1] Item[1] Descriptor/Identifier (persistent identifier) Descriptor/modified Item[1..∞] (of type descriptiveMetadata) Descriptor/type (« descriptiveMetadata ») Descriptor/Identifier (persistent identifier) Descriptor/modified Component/Resource -- representation by value (XML) Item[0..∞] (of type objectFile) Descriptor/type (« objectFile ») Descriptor/Identifier (persistent identifier) Descriptor/modified Component/Resource -- representation by ref. (URL) Item[0..1] (of type humanStartPage) Descriptor/type (« humanStartPage ») Component/Resource -- representation by ref. (URL) • EO descriptive metadata model • Publication isdescribed as a complex (compound) object • persistent identifier • Aggregation of 3 types of components • descriptiveMetadata (MODS) • objectFiles • humanStartPage • Extensible • additional items canbestoredwithin the complexobject • MODS contains DAI of EO author • Semantic Web - Linked Data – OAI-ORE ready

  7. Descriptive metadata exchange format • Central EO gateway • DIDL and MODS application profiles • Vocabularies in DIDL and MODS • Technical guidelines for project partners • All documentation is OA available • Partner solutions: home-made or with external support • ARNO home-made • Dspace home-made, AtMire • Eprints home-made, ECS-University Of Southampton • Fedora METS/MODS -> DIDL/MODS • DigiTool METS/MARC -> DIDL/MODS • All original partners + 2 new partners

  8. Decentralized registry service • Aim: sustainable solution for big network with many partners • Decentralized Admin file • Format XML-RDF | FOAF + NEEO-specific vocabulary • Decentralized file sits on local web server of project partner • Content - information of institution : name, description, ... • - OAI baseURL + OAI sets to harvest • - EO authors: DAI, photograph, full name, affiliation • EO gateway HTTP gets and validates at regular intervals • Used for - information in EO portal screens • - publication lists (match on DAI) • - automated harvesting process

  9. Usage statistics – EO use case • EO use case: present download rates through EO portal per publication, scholar, institution • Normalization of exchange format and communication protocol • OAI-PMH exchange of SWUP OpenURL ContextObjects (ScholarlyWorks Usage CommunityProfile) • Specialconsiderations: • Enryption of IP address of requester (MD5) • Filtering out robot requests (list of 50 regular expressions) • Filtering out double clicks • Similar initiatives come together at Knowledge Exchange workshop, Berlin 29-30 March 2010 • JISC (Usage StatisticsReviewproject), Pirus2, SurfSure, Counter, Mesur, OA-Statistik, Economists Online

  10. Usage statistics – implementation status • Central EO Gateway – DoDoCo (Document DownloadCounter) • PMH harvesting of SWUP ContextObjects into SQL database • Enrichwith information on item, scholar, institution • Web service level (item, scholar, institution) + date range • Technical guidelines for project partners (OA available) • Partners • Implementation - for all major IR platforms • - solution for Combined Log Format web logs • Registration through Admin file • 7 original + 1 new partner • Not enough data available • Not visible through EO portal yet, although DoDoCo software is ready

  11. Added value services • Publication lists • Per DAI of authors who are registered in Admin file • SRU extract publications from EO gateway and Format • APA+ in HTML • with links to full text in EO partner repository • with links to publisher sites (through OpenURL resolution) • APA in PDF • APA in RTF • RIS • BibTex

  12. Added value services • Enriched descriptive metadata • JEL classification • Enrichment service (ES) gets records to be enriched from EO, over SRU • ES creates enrichment record(s), using text mining technology • ES makes enrichment record(s) available to EO, over OAI-PMH • EO harvests enrichment records from ES and integrates into original record • EO reuses enrichment information in its services: index & present • Bibliographicreferences • Through collaboration with RePEc/CitEc • Visible through EO portal

  13. Added value services • Full-text search service • Process • Full-text indexer component in Meresco fetches relevant records from EO Gateway over SRU • Follow links to PDF object files • Textisextractedfrom PDF, and added to record through SRU Update • EO can now index & present • Prototype exists • Not yetfullydeployed in EO portal

  14. Added value services • Multilinguality (EN, FR, GE, ES) • Complete EO portal interface • JEL classification • MLIA functionality in EO portal • Student thesis – Prof. Bouillon (Univ. Of Geneva -- multilingual information processing department ) • (uncustomized) Systran and Google Translate show equivalent results • Contacts with CACAO (also through Europeana) • comes as a complete portal solution, not as an add-in for existing portals like EO • Considerations: • Lingua franca in economics = EN • NEEO = NOT research project in linguistics, aim: reuse best existing technology • Use “Google Translate” for translation of queries

  15. Collaboration with RePEc • Harvesting metadata from RePEc into EO • AMF to DIDL/MODS mapping • Push metadata from EO to RePEc • “RePEc:ner” archive, with separate series for each EO institution • According to agreed-upon reviewed ReDIF format • Admin file directives in order to limit overlap • Contribute to LogEc • Reuse CitEc data in EO portal

  16. EO gateway and portal • Gateway – metadata store and search engine • Choice between Summa, SOLR/Lucene, Meresco • Open source solution, based on Lucene search engine • Support available from software developers (CQ2 company) • Has proven its qualities in the past (DARENet) • Portal • First version: home-made • Final version: • outsourced design to private company • HTML, CSS, JavaScript, all images

More Related