1 / 64

Toro 1

Toro 1. EMu Hacking at the Peabody Museum. Yale campus. Peabody Collections Counts & Functional Cataloguing Unit. Anthropology 325,000 Lot Botany 350,000 Individual Entomology 1,000,000 Individual Invertebrate Paleontology 300,000 Lot

Télécharger la présentation

Toro 1

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Toro 1 EMu Hacking at the Peabody Museum

  2. Yale campus

  3. Peabody CollectionsCounts & Functional Cataloguing Unit • Anthropology 325,000 Lot • Botany 350,000 Individual • Entomology 1,000,000 Individual • Invertebrate Paleontology 300,000 Lot • Invertebrate Zoology 300,000 Lot • Mineralogy 35,000 Individual • Paleobotany 150,000 Individual • Scientific Instruments 2,000 Individual • Vertebrate Paleontology 125,000 Individual • Vertebrate Zoology 185,000 Lot / Individual 2.7 million database-able units => ~11 million items

  4. Peabody CollectionsFunctional Units Databased • Anthropology 325,000 90 % • Botany 350,000 1 % • Entomology 1,000,000 1 % • Invertebrate Paleontology 300,000 55 % • Invertebrate Zoology 300,000 20 % • Mineralogy 35,000 85 % • Paleobotany 150,000 60 % • Scientific Instruments 2,000 100 % • Vertebrate Paleontology 125,000 60 % • Vertebrate Zoology 185,000 95 % 940,000 of 2.7 million => 37 % overall

  5. Big events EMu migration in '05 (all disciplines went live simultaneously) Physical move in '00-'02 (primarily neontological disciplines)

  6. The four YPM buildings Peabody (YPM) Environmental Science Center (ESC) Geology / Geophysics (KGL) 175 Whitney (Anthropology)

  7. VZ Kristof Zyskowski (Vert. Zool. - ESC) Greg Watkins-Colwell (Vert. Zool. - ESC)

  8. HSI Shae Trewin (Scientific Instruments – KGL )

  9. VP Mary Ann Turner (Vert. Paleo. – KGL / YPM)

  10. ANT Maureen DaRos (Anthro. - YPM / 175 Whitney)

  11. EMu Hacking at Peabody Hacking – in a laudatory programming sense, not a criminal sense

  12. Mitnick Often we tend to think of “hackers” in this mode

  13. Mitnick modified cracker A better moniker

  14. Mitnick modified w/EMu cracker Crackers often have unnamed accomplices…

  15. 3 Vignettes of YPM EMu “hacks” • An issue of functionality (background script) • An issue of performance (tweaking the catalogue) • An issue of user behavior & cost (another script…)

  16. Hack Vignette #1 Multimedia module - JPEG 2000 support

  17. http://www.jpeg.org/jpeg2000 • - non-proprietary compression standard • - lossless mode (much smaller files) • lossy mode (vastly smaller files) • potential space/bandwidth savings

  18. http://www.fnordware.com/j2k

  19. JP2 spicebush with J2K and tail target

  20. JP2 spicebush tails with file sizes 1.54 mB (native TIFF) 15 kB (heavily squeezed JP2)

  21. 261 kb – <1% 1,302 kb – 2% HERBIS images 5,166 kb – 12% 62,640 kb – 100%

  22. JP2 – no thumbnail In EMu, oops… no thumbnail

  23. JP2 – script coding find imagedir –name *.jp2 –mtime -2 –print loop on the matches and test to see which recently loaded JP2 files are missing a thumbnail JPG, or which JP2 files have been modified more recently than their existing thumbnail JPG ; then build filenames for any qualifying target JPGs ; execute script several times per hour from cron jasper –f match –F tempfile convert tempfile –resize 90x90 target

  24. JP2 – prior, without script wakes up every 20 minutes…

  25. JP2 – now, with makes the thumbnail…

  26. JP2 – Tiled View JP2 files now behave just like all other standard multimedia

  27. JP2 – Photoshop opens Double click and the Photoshop handler kicks in

  28. V. 1 – simply generated thumbnails in the background JP2 – V1

  29. V. 2 – also inserted suitable metadata into records via texload JP2 – V2 (next version, script to be called directly in validation code at file time)

  30. Hack Vignette #1 Moral #1 = EMu is extensible, you may be able to implement significant changes yourself in whole or in part, without delay

  31. Hack Vignette #2 Catalogue module - performance issues

  32. Default EMu “cron” job configuration Mo Tu We Th Fr Sa Su late night workday evening = emulutsrebuild = emumaintenance batch = emumaintenance compact Orange is time EMu busy running background jobs. Interfering with workday work, and leaving Sunday processing time idle/unused.

  33. The ecatalogue database is a rate limiter File NameFunction ~/emu/data/ecatalogue/data the actual data ~/emu/data/ecatalogue/rec indexing (part) ~/emu/data/ecatalogue/seg indexing (part) At YPM, the combined size of these was >10 gB, with 4 gB in data and 3 gB in both rec and seg

  34. Touch many types of records in EMu… e.g., Party record add middle name e.g., Bibliography record add author e.g., Collecting Events record add collector …automatic changes subsequently propogate to numerous records in the ecatalogue database …ecatalogue can grow a lot and slow EMuto varying degreesbetween maintenance runs

  35. How to make ecatalogue go faster ?

  36. Make it smaller - trim nulls from Legacy Data ? maybe save 20+% ?

  37. Make it smaller - trim nulls from Legacy Data ? Repetitive scripting of texexport & texload jobs Conducting around a million re-imports of records Manual adjustment of nightly cron jobs to accommodate Do the work at nighttime over a month-long period Watched ecatalogue closely to keep from exploding disk

  38. Starting situation at YPM for ecatalogue (gB on y axis) data seg rec

  39. delete nulls from AdmOriginalData data seg rec

  40. sites – round 2 constant data lengthy prefixes … not satisfied with just that… here are some other things to possibly trim!

  41. >55 % ! delete nulls from AdmOriginalData data shorten prefix on AdmOriginalData seg rec selectively delete AdmOriginalData

  42. catalogue – round 2 data seg rec What ecatalogue AdmOriginalData looks like post scripting

  43. BEFORE Default EMu “cron” job configuration Mo Tu We Th Fr Sa Su late night workday evening = emulutsrebuild = emumaintenance batch = emumaintenance compact

  44. AFTER Modified EMu “cron” job configuration * * * Mo Tu We Th Fr Sa Su late night workday evening = emulutsrebuild = emumaintenance batch = emumaintenance compact Can now squeeze all maintenance into wee hours of night, use Sunday, and fully compact ecatalogue every other day (asterisks)!

  45. Quick backup Also, all of YPM EMu can now be squeezed onto a thumbdrive

  46. Hack Vignette #2 Moral #2 = know your data, you can put aspects of EMu on a diet and your computer system is likely to thank you

  47. Hack Vignette #3 EMu sessions - licensing and user behavior

  48. Dreaded email for sysadmins Dreaded email WARNING! 2 KE EMu user(s) are currently being denied access because all 10 of your KE EMu licenses are in use. For license upgrades, please contact info@kesoftware.com

  49. Museum Director: "Go license shopping at KE!"Systems Admin: "VISA or MasterCard?" The conversation you dream of but of course never have…

  50. What do you need ? • Guaranteed license seat for every potential user ? • Cover maximal number of expected concurrent users ? • Minimize expenses by minimizing license seats ?

More Related