200 likes | 297 Vues
Explore operating system packages for bioinformatics, including software, config files, and documentation encapsulated in a single file, along with metadata such as version and dependencies. Learn about package managers to ensure integrity and manage installations effectively. Discover the importance of bioinformatics packages for consistency, dependency handling, and system auditing. Access RPM packages for Fedora Core with a focus on applications, libraries, and datasets. Explore available GMOD components and gene expression resources. Visit Biopackages.net for detailed information and RPM downloads.
E N D
Biopackages.net Operating System Packages for Bioinformatics Allen Day 2005.05.17
What is a package? • Software, config files, documentation, and/or data encapsulated in a single file • Metadata describing: • Version, license, package “category” • Dependencies • What the package provides
GMOD target audience • Small MODs
Package Dependency Graph • Dependencies • What the package provides chado-Hsa genome-Hsa-annotation-gene postgresql-AffxSeq genome-Hsa-annotation-affymetrix chado perl-bioperl perl-go-perl postgresql-server genome-Hsa-nib ucsc-blat obo-core
Dependencies • Build Dependency • Installation Dependency
What is a Package Manager? • Tools to manage installation, upgrade, uninstallation of packages • Verify package integrity (checksums) • Maintain system integrity • Transactional • Allow rollbacks • Dependency checking • Dependency graph recursion • Allow software customization (patches)
Why bioinformatics packages? • Consistency of installation process • Bioinfo. package installs vary wildly, and commonly lack documentation • Automatic dependency installation • Perl modules especially bad – bioperl has 60+ modules in its dependency tree • Integrity/Auditing of system state • Know an installed package works, which version, how to replicate system setup • Tighter integration with operating system • Daemons, config & log file locations, etc.
What’s available? • RPM packages only right now • Primary focus on Fedora Core 2 • Some RPMs also available for • Fedora Core 3 • RedHat 9 • Cygwin
What’s available? • Three primary foci • Applications • Libraries • Data sets
Applications • Gbrowse • Textpresso • BLAT daemon • NCBI Toolkit (BLAST, etc) • HMMer
What’s available? • Libraries • Bioperl • R & Bioconductor • Squid • EMBOSS
What’s available? • Data sets • Genome & protein sequence • Sequence features • Ontologies • All installed using a common directory structure
What’s available? • UCSC tools (utilities, BLAT system service, CGI scripts) • Bioperl • R / Bioconductor • GMOD apps (Gbrowse, Textpresso, …) • Data packages • Genome sequence (fa, nib, blastdb) • Genome features (Affy probeset alignments, mRNA, etc)
das2-Hsa apollo-Hsa cmap-Hsa genome-Hsa-nib ucsc-BLAT GMOD Components Available gmod-web-Hsa chado-Hsa gbrowse textpresso turnkey chado • ‘Hsa’ can be substituted for your organism • Currently built for ‘Cel’, ‘Hsa’, ‘Sce’
More details… chado-Hsa genome-Hsa-annotation-gene genome-Hsa-annotation-affymetrix postgresql-AffxSeq chado perl-go-perl perl-bioperl postgresql-server genome-Hsa-nib ucsc-blat … … … … …
Gene Expression Components DAS/2 for Genotyping, GeneChip Quant/Norm Pipeline chado-GEC chado-Hsa R Bioconductor
Resources • http://www.biopackages.net • ~1000 RPMs for Fedora Core 2, 3 • Available via yum • See site for a configuration example.
TODO • Support more architectures • Build for Cygwin & OS X. RPM has been ported to both • Automate package build process • Build farm of multiple architectures, controllable via scheduler (GridEngine) • Automate (if possible) inclusion of new software / data releases
TODO • Build community interest and involvement • Keep adding more packages! • Keep existing packages current!
Acknowledgements • Patrick Alger • Jared Fox • Brian O’Connor • Todd Harris • Lincoln Stein • Stanley Nelson