1 / 23

PIPING HOT: Little Bins in big workflows

PIPING HOT: Little Bins in big workflows. Alex Garnett Digital Preservation & Data Curation SFU Library. Thesis: I am a terrible programmer. Thesis: I am a terrible programmer. 2 0% of you are thinking “no kidding!” The other 80% of you are thinking “uh huh. Stupid false-modest shmuck .”.

ryder-watts
Télécharger la présentation

PIPING HOT: Little Bins in big workflows

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. PIPING HOT:Little Bins inbig workflows Alex Garnett Digital Preservation & Data Curation SFU Library

  2. Thesis: I am a terrible programmer

  3. Thesis: I am a terrible programmer • 20% of you are thinking “no kidding!” • The other 80% of you are thinking “uh huh. Stupid false-modest shmuck.”

  4. Thesis: I am a terrible programmer • 20% of you are thinking “no kidding!” • The other 80% of you are thinking “uh huh. Stupid false-modest shmuck.” • Who needs impostor syndrome when you have a bash shell?

  5. For the record, this is the payoff from all those colonoscopy jokes. Yep.

  6. But how does it apply to libraries?[If MJ Suhonos is here this year, this is his cue to groan audibly]

  7. LIBRARY PROBLEM #1: PDFA • ProQuest wants PDFA submissions from now on • “now on” apparently = the past five years’ backlog • We have to convert five years of theses! • This is now also being used at the UofA.

  8. LIBRARY PROBLEM #2: ARCHIVES PROBLEM:LIBRARY HARDERSTARRING BRUCE WILLISCRAP, I USED UP THE WHOLE SLIDE ON THE TITLE

  9. Archives needed a GUI tool to be able to create restrictive FTP accounts for donors.

  10. LIBRARY PROBLEM #3:PDF REDACTION (IT’S LIKE THE FIRST ONE BECAUSE NO ONE LIKED THE SEQUEL, DOES ANYONE WANT TO WATCH TEMPLE OF DOOM LATER, OH HELL I’VE DONE IT AGAIN)

  11. We learned we had some poorly redacted PDFs • Blackout meant to obscure text; still selectable

  12. Solution: • Detect offending pages with ghostscript… • (this is the hard part; dumping PDF guts is appalling)

  13. … and then: • Snip offending pages with pdftk • Convert them to images with imagemagick • OCR back into PDF (minus obscured text) with tesseract and fix up the dimensions with gs again • Paste back in with pdftk. • 5 lines, all free tools! Documentation & piping.

  14. Takeaway • If you find yourself doing a very bad job of learning PHP and feeling like you have something to prove: it doesn’t have to be this way

  15. Takeaway • If you find yourself doing a very bad job of learning PHP and feeling like you have something to prove: it doesn’t have to be this way • There is a huge amount of useful space you can occupy as a barely-programmer if you’re comfortable using a terminal for problem solving (less so on Windows). StackOverflow and Google are your friend.

  16. Takeaway • Open-source command line tools are really good these days! They are powerful, they are straightforward, and they are often cutting edge. • There is a huge amount of useful space you can occupy as a barely-programmer if you’re comfortable using a terminal for problem solving (less so on Windows). StackOverflow and Google are your friend.

  17. Surprise: Everybody gets a free colonoscopy after all! • Thanks! garnett@sfu.ca ; @axfelix

More Related