200 likes | 297 Vues
Biopackages.net. Operating System Packages for Bioinformatics Allen Day 2005.05.17. What is a package?. Software, config files, documentation, and/or data encapsulated in a single file Metadata describing: Version, license, package “category” Dependencies What the package provides.
 
                
                E N D
Biopackages.net Operating System Packages for Bioinformatics Allen Day 2005.05.17
What is a package? • Software, config files, documentation, and/or data encapsulated in a single file • Metadata describing: • Version, license, package “category” • Dependencies • What the package provides
GMOD target audience • Small MODs
Package Dependency Graph • Dependencies • What the package provides chado-Hsa genome-Hsa-annotation-gene postgresql-AffxSeq genome-Hsa-annotation-affymetrix chado perl-bioperl perl-go-perl postgresql-server genome-Hsa-nib ucsc-blat obo-core
Dependencies • Build Dependency • Installation Dependency
What is a Package Manager? • Tools to manage installation, upgrade, uninstallation of packages • Verify package integrity (checksums) • Maintain system integrity • Transactional • Allow rollbacks • Dependency checking • Dependency graph recursion • Allow software customization (patches)
Why bioinformatics packages? • Consistency of installation process • Bioinfo. package installs vary wildly, and commonly lack documentation • Automatic dependency installation • Perl modules especially bad – bioperl has 60+ modules in its dependency tree • Integrity/Auditing of system state • Know an installed package works, which version, how to replicate system setup • Tighter integration with operating system • Daemons, config & log file locations, etc.
What’s available? • RPM packages only right now • Primary focus on Fedora Core 2 • Some RPMs also available for • Fedora Core 3 • RedHat 9 • Cygwin
What’s available? • Three primary foci • Applications • Libraries • Data sets
Applications • Gbrowse • Textpresso • BLAT daemon • NCBI Toolkit (BLAST, etc) • HMMer
What’s available? • Libraries • Bioperl • R & Bioconductor • Squid • EMBOSS
What’s available? • Data sets • Genome & protein sequence • Sequence features • Ontologies • All installed using a common directory structure
What’s available? • UCSC tools (utilities, BLAT system service, CGI scripts) • Bioperl • R / Bioconductor • GMOD apps (Gbrowse, Textpresso, …) • Data packages • Genome sequence (fa, nib, blastdb) • Genome features (Affy probeset alignments, mRNA, etc)
das2-Hsa apollo-Hsa cmap-Hsa genome-Hsa-nib ucsc-BLAT GMOD Components Available gmod-web-Hsa chado-Hsa gbrowse textpresso turnkey chado • ‘Hsa’ can be substituted for your organism • Currently built for ‘Cel’, ‘Hsa’, ‘Sce’
More details… chado-Hsa genome-Hsa-annotation-gene genome-Hsa-annotation-affymetrix postgresql-AffxSeq chado perl-go-perl perl-bioperl postgresql-server genome-Hsa-nib ucsc-blat … … … … …
Gene Expression Components DAS/2 for Genotyping, GeneChip Quant/Norm Pipeline chado-GEC chado-Hsa R Bioconductor
Resources • http://www.biopackages.net • ~1000 RPMs for Fedora Core 2, 3 • Available via yum • See site for a configuration example.
TODO • Support more architectures • Build for Cygwin & OS X. RPM has been ported to both • Automate package build process • Build farm of multiple architectures, controllable via scheduler (GridEngine) • Automate (if possible) inclusion of new software / data releases
TODO • Build community interest and involvement • Keep adding more packages! • Keep existing packages current!
Acknowledgements • Patrick Alger • Jared Fox • Brian O’Connor • Todd Harris • Lincoln Stein • Stanley Nelson