1 / 17

Curation in the Cloud Hull’s Fedora and Hydra perspective Richard Green

Curation in the Cloud Hull’s Fedora and Hydra perspective Richard Green. Curation in the Cloud, London, 7/8 March 2012. Institutional repository background. Hull has been running a Fedora-based institutional repository for several years Originally based on Fedora + Muradora UI

caesar
Télécharger la présentation

Curation in the Cloud Hull’s Fedora and Hydra perspective Richard Green

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Curation in the CloudHull’s Fedora and Hydra perspectiveRichard Green Curation in the Cloud, London, 7/8 March 2012

  2. Institutional repository background • Hull has been running a Fedora-based institutional repository for several years • Originally based on Fedora + Muradora UI • More recently (6 months) based on Fedora + Hydra • The repository covers a wide range of content – not just OA articles… Curation in the Cloud | London | 7/8 March 2012 | 2

  3. Curation in the Cloud | London | 7/8 March 2012 | 3

  4. Wide range of content to deal with -        Exam papers-        e-Theses & dissertations (ETDs)-        Journal articles -        Meeting papers or minutes-        Policies or procedures-        Dissertations (undergraduate)-        Photographs-        Presentations-        Books-        Book chapters-        Regulations-        Reports -        Conference papers or abstracts-        Leaning materials-        Handbooks -        Internet publications-        Newsletter articles-        Datasets-        Sound-        Moving images-        Guidance documents -        Licences-        Posters-        Events-        Letters -        Artwork-        Diagrams-        Maps-        Software - etc (!!!) Curation in the Cloud | London | 7/8 March 2012 | 4

  5. Affiliations • Hull was instrumental in founding the Fedora UK & Ireland User Group… • 20 or so informal members Curation in the Cloud | London | 7/8 March 2012 | 5

  6. Affiliations [2] • and is a founder member of the Hydra partnership (with the University of Virginia, Stanford University and Fedora Commons) • Fedora does not have an ‘out-of-the-box’ UI. Hydra set out to provide building blocks from which highly functional (full-CRUD) UIs could be built over it • Growing number of Hydra-using institutions in the US, two or three so far in the UK • Hydra “content modelling” is proving useful to non-Hydra Fedora users Curation in the Cloud | London | 7/8 March 2012 | 6

  7. At the moment? • Just starting to think seriously about opportunities in the cloud • This meeting is opportune to help clarify what is still somewhat fuzzy thinking • At the moment, we in Hull are considering the use of cloud storage in addition to local storage for its Hydra repository Curation in the Cloud | London | 7/8 March 2012 | 7

  8. At the moment? [2] • Why the cloud? • Could be used to provide near-line capability for rarely used assets which are individually ‘small’ but numerous • Potential to store very large, but rarely accessed, assets (TB range) ‘cheaply’ (cf high-performance SAN storage) • Possibility of leveraging ‘above campus’ services (Image manipulation? Video streaming? Format migration?) Curation in the Cloud | London | 7/8 March 2012 | 8

  9. At the moment? [3] • WE’RE NOT • considering a complete repository infrastructure in the cloud • Happier with the software stack locally • considering local software with all-cloud storage • There are known problems with latency etc • WE ARE • considering a hybrid of the two Curation in the Cloud | London | 7/8 March 2012 | 9

  10. At the moment? [4] • How? • In principle, Fedora (and therefore Hydra) allows for a mix and match of storage: Fedora managed (local file system), external (http accessible), redirected (redirects user to appropriate URL) • So: • use “managed content” for straightforward, small and/or high access materials; • use “external content” for low access materials or where there is a value-added service. Curation in the Cloud | London | 7/8 March 2012 | 10

  11. Scale of problem • Bulk of repository content is “small” – megabytes • Multimedia content is larger (10s-100s megabytes) and our current offering is “download” – we cannot (yet) stream • We know there are multi-TB datasets on campus to be dealt with • eg Biology have one 6TB growing at 2TB per quarter Curation in the Cloud | London | 7/8 March 2012 | 11

  12. Potential practical problems • High-access materials could generate large download charges • Better suited to low access objects or to get ‘value added’ services • Need a way of predicting costs over long periods (using the LIFE model?) • Getting large objects/volumes into the cloud • Transfer times for TBs of content are considerable. Use UPS to send a hard drive (or several?) Curation in the Cloud | London | 7/8 March 2012 | 12

  13. Potential practical problems [2] • Security • Hull’s IR has very granular security (categories [public/staff/student], groups [eg student modules], individuals) • Need to be able to restrict access to cloud-based materials accordingly Curation in the Cloud | London | 7/8 March 2012 | 13

  14. Potential practical problems [3] • Durability • “Designed to provide 99.999999999% durability” (Amazon S3 SLA). And the other 0.000000001%? Not a lot, but… • Could that mean for every terabyte you send us we promise not to corrupt more than ten or so bytes?!? • Or that we might lose 1 in 1011 files, which might not be quite so bad providing it’s not one of your files • LOCKSS type approach across several providers? Curation in the Cloud | London | 7/8 March 2012 | 14

  15. Potential Practical Problems [4] • Management of an institutional cloud • Can an institution realistically manage its own cloud space(s)? • Managing just the data • Maybe managing cloud-based services • Is the idea of third-party management (à la DuraSpace) a more appropriate model? Curation in the Cloud | London | 7/8 March 2012 | 15

  16. So, in summary… • Hull is potentially interested in cloud solutions for: • Low access materials which individually are not big but taken together are (eg 000s of images) • TB+, low-access objects • ‘Above campus’, value-added services (Image manipulation, media streaming, format migration, LOCKSS-in-the-Cloud?) • Maybe sounds like a job for a UK HE oriented, brokered service akin to DuraCloud’s model? Curation in the Cloud | London | 7/8 March 2012 | 16

  17. Contacts and links IR Service owner: Chris Awre (c.awre@hull.ac.uk) Hydra Project Manager for Hull: Richard Green (r.green@hull.ac.uk) Hull Institutional Repository: hydra.hull.ac.uk Fedora website: fedora-commons.org Hydra website: projecthydra.org Fedora UK&I User group: fedora-uki.org.uk Curation in the Cloud | London | 7/8 March 2012 | 17

More Related