1 / 13

FITS: The File Information Tool Set

FITS: The File Information Tool Set. Background. FITS is part of the second generation Harvard University Library Digital Repository Service(DRS2), which supports content models and METS/PREMIS object descriptors. Developed Fall 2008

kenna
Télécharger la présentation

FITS: The File Information Tool Set

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. FITS: The File Information Tool Set

  2. Background • FITS is part of the second generation Harvard University Library Digital Repository Service(DRS2), which supports content models and METS/PREMIS object descriptors. • Developed Fall 2008 • First public release Spring 2009: http://fits.googlecode.com

  3. Why? • Needed an automatic way to identify and extract metadata for a wide range of file types • No single file analysis tool satisfied our needs

  4. Design Goals • Act as a wrapper around other open source tools • Extensible • Needs to be a standalone command line tool and also provide an API • Allow priority setting for tools • Open source

  5. The Tools • Current tools: • Jhove 1.5 • Exiftool • National Library of New Zealand Metadata Extractor (NLNZ) • DROID • FFIdent • File Utility • 3 Categories • File Identification (all of them) • Metadata Extraction (Jhove, Exiftool, NLNZ) • format Validation (Jhove)

  6. Process

  7. Features • Conflict management • Value normalization • “inches” vs “2” • Tool prioritization • Format tree for understanding more specific format identities. • PDF/A is a more specific version of PDF

  8. Example Output • <fits> • <identification> • <identity format="Graphics Interchange Format" mimetype="image/gif"> • <tool toolname="Jhove" toolversion="1.5" /> • ... • </identity> • </identification> • <fileinfo> • <size toolname="OIS File Information" toolversion="0.1" status="SINGLE_RESULT">40149</size> • <md5checksum toolname="OIS File Information" toolversion="0.1" • status="SINGLE_RESULT">265c9345ebf93c89d472766fda095de4</md5checksum> • ... • </fileinfo> • <filestatus> • <well-formed toolname="Jhove" toolversion="1.5" status="SINGLE_RESULT">true</well-formed> • <valid toolname="Jhove" toolversion="1.5" status="SINGLE_RESULT">true</valid> • </filestatus> • <metadata> • <image> • <height toolname="Jhove" toolversion="1.5" status="SINGLE_RESULT">1024</height> • ... • </image> • </metadata> • </fits>

  9. Configuration • All settings are in the fits.xml config file • Enable/disable tools (available in the API too) • Prevent tools from processing files with specific file extensions • Set tool priority • Add new tools • Use your own consolidator code • Report or ignore conflicts • Options to display original tool output

  10. Sample Configuration File • <fits_configuration> • <!-- Order of the tools determines preference --> • <tools> • <!-- exclude-exts attribute is a comma delimited list of file extensions that the tool should not try to process --> • <tool class="edu.harvard.hul.ois.fits.tools.jhove.Jhove" exclude-exts="dng,mbx"/> • <tool class="edu.harvard.hul.ois.fits.tools.fileutility.FileUtility" exclude-exts="dng,wps"/> • <tool class="edu.harvard.hul.ois.fits.tools.exiftool.Exiftool" exclude-exts="txt,wps,vsd"/> • <tool class="edu.harvard.hul.ois.fits.tools.droid.Droid" exclude-exts="dng"/> • <tool class="edu.harvard.hul.ois.fits.tools.nlnz.MetadataExtractor" exclude-exts="dng,zip,odb,ott,odg,otg,odp,otp,ods,ots,odc,otc,odi,oti,odf,otf,odm,oth"/> • <tool class="edu.harvard.hul.ois.fits.tools.oisfileinfo.FileInfo"/> • <tool class="edu.harvard.hul.ois.fits.tools.oisfileinfo.XmlMetadata"/> • <tool class="edu.harvard.hul.ois.fits.tools.ffident.FFIdent" exclude-exts="dng,wps,vsd"/> • </tools> • <output> • <dataConsolidator class="edu.harvard.hul.ois.fits.consolidation.OISConsolidator"/> • <display-tool-output>true</display-tool-output> • <report-conflicts>true</report-conflicts> • <validate-tool-output>false</validate-tool-output> • <internal-output-schema>xml/fits_output.xsd</internal-output-schema> • <external-output-schema>http://hul.harvard.edu/ois/xml/xsd/fits/fits_output.xsd</external-output-schema> • <fits-xml-namespace>http://hul.harvard.edu/ois/xml/ns/fits/fits_output</fits-xml-namespace> • </output> • <!-- file name of the droid signature file to use in tools/droid/--> • <droid_sigfile>DROID_SignatureFile_V35.xml</droid_sigfile> • </fits_configuration> 10

  11. Some Limitations... • Speed • Technical metadata only returned if the tool that reported it is in the first <identity> block • FITS considers a successful identification to be a combination of the format name and mime type

  12. Future Plans • More tools • Apache Tika (text document formats) • Jhove 2 • Aduna Aperture (text, documents, email formats) • Mediainfo (audio and video formats) • Better audio and video format support as we add object support for them to DRS2

  13. Wrap Up • http://fits.googlecode.com • http://ots-schemas.googlecode.com • Java library for reading and writing METS (limited support), MODS, PREMIS, MIX, TextMD, DocumentMD, and soon AES audio metadata • More information on DRS2: http://hul.harvard.edu/ois/systems/drs/enhancements.html

More Related