CADAQUES: Bioinformatics Cluster Solution

Cadaques A cluster for our bioinformatic needs Txema Heredia1,2, Ángel Carreño1,2, Carles Perarnau3, Carlos Morcillo-Suarez1,2, and Arcadi Navarro1,2,4 1 IBE Institut de Biologia Evolutiva UPF-CSIC, Barcelona, Spain. 2 Instituto Nacional de Bioinformática, Spain. 3 Unitat de Suport Tecnològic a Projectes de Recerca, UPF, Barcelona, Spain. 4 Institució Catalana de Recerca i Estudis Avançats (ICREA). Catalonia, Spain. arcadi.navarro@upf.edu INTRODUCTION • Nowadays, the amount of biological data available has increased in such a way that the “friendly” analysis you were used to run have become several week-long monsters which no longer can be faced by your desktop or laptop computers. • It is expected that the number of analysis needed will increase, so we need a solution to be able to run them in a proper way. • That solution is CADAQUES, our cluster system. • It is widely used. Total time consumed so far: 540,349 hours, i.e. 61.7 computing years in only 2.3 real years!! TECHNICAL SPECIFICATIONS • 11x IBM XM 21 blades (same technology as Marenostrum) resulting in: • 16x Intel Xeon E5345 Quad-core @ 2.33GHz CPUs, which allow to run up to 64 single-core jobs • 192 Gb of memory (4 blades with 32 Gb and 4 blades with 16 Gb) • 30 Tb of disk • Sun Grid Engine queue system OPEN TO EVERYONE • The cluster is open to any member of the IBE. • If you are interested in using it, send me an email to txema.heredia@upf.edu, and I will create you an account. • The scientific director of the cluster is Arcadi Navarro, so pester him if you are in a hurry. • QUEUES SYSTEM • In order to submit a job to the cluster, instead of running it directly, you have to submit it to the queue system. This allows the cluster to distribute the jobs in a fair and efficient way. • Fairness. The queue system has a fairness feature that, instead of scheduling the jobs “first in first out”, it distributes it among all the users, preventing a single user to monopolize all the job slots. • High allocation system. The queues system tries to allocate the jobs according to the available cluster resources in a given moment. This allows little demanding jobs to slip through bigger ones, decreasing your waiting time and increasing effectively the cluster resource usage. SOFTWARE • WHAT DO I NEED? • A computer (Windows, Linux and Mac are welcome). • An ssh connection software (Putty for windows, or Linux & Mac • system’s built-in). • An internet connection. • Some Linux usage skills. Don’t panic! It’s easy. • Operative System: Linux CentOS 5.0 (RedHat) Rocks Cluster Distribution • The following bioinformatics software are currently installed: • Haplotype Estimation & Analysis • Fast Phase • Phase • Haploview • LDhat • Phylogenetics • MrBayes • Paml41 • Whole Genome Association Analysis • Plink • Population Genetics • Clumpp • Structure • Cosi • Simcoal2 • ihs • Sweep • Xpehh • Programming languages available in the cluster: • R • Perl • Bioperl • Python • C • Java • Php • MPI • Open MP • Sequence Analysis & Manipulation • Hmmer • Staden • TrimAl • GBlocks • Microbiology • Dotur • Mothur • S-libshuff • Sons • Treeclimber • Sequence Assembly • Caftools • Gap5 • Mira • Sequence Alignment • Blast • T-coffee Time gained by using the cluster • DATA • 30 Tb of disk storage. • Mysql server to store your databases. • Currently hosting a series of public databases: • HapMap • UCSC Genome Browser • Ensembl • Samba server, so you can access your data easily, and use it as a remote backup system. • Web repository. • … but anything can be installed under demand. Feel free to ask!

CADAQUES: Bioinformatics Cluster Solution

CADAQUES: Bioinformatics Cluster Solution

Presentation Transcript

Presents

Ubuntu Linux

Introducing a free software tool for:

So you’re buying a new computer…….. Which Operating System fits your needs?....

Legend

COMPUTER

Running Windows Applications on Linux

Chapter 13

Samba: Integrating Linux and Windows

Counting Letters in an Unicode String

Computer Architecture Discussion

CSCD 303 Essential Computer Security Fall 2010

Introduction to Emacs

Computer Security

CPS 590.3 Computer Security

Computer Architecture Discussion