130 likes | 259 Vues
The NC BioGrid Project aims to enhance genomic research through advanced computing and data resources. This initiative addresses the exponential growth of genomic data, exemplified by the rapid expansion of GenBank. With increasing base pair data and diverse databases, including DDBJ and Celera Genomics, the project leverages cutting-edge technologies, such as grid computing and middleware, to support efficient resource sharing. The consortium comprises various North Carolina institutions, focusing on human health, agriculture, and evolutionary biology, uniting expertise for impactful research.
E N D
North CarolinaBioinformatics Grid Thom H. Dunning, Jr. HPCC Division, MCNC Chemistry, University of North Carolina
GenomicsA Compute- & Data-Intensive Science * from TimeLogic
Data ExplosionRapid Growth of GenBank • Growth of GenBank • Number of base pairs increasing dramatically (exponentially) • Growth in 2002 due to additions in just 21 days! No. Gbases
Data ExplosionNumber and Diversity of Databases Nucleic Acids Research, 2002, Vol. 30, No. 1 Table 1. Molecular Biology Database Collection Major Public Sequence Repositories DNA Data Bank of Japan (DDBJ) http://www.ddbj.nig.ac.jp All known nucleotide and protein sequences … Varied Biomedical Content … VirOligo http://viroligo.okstate.edu Virus-specific oligonucleotides for PCR and … 333 Databases
Computing ExplosionAssembly and Analysis of Genomic Data • Celera Genomics–Assembling the Genome • Compaq Alpha Clusters • Number of processors: ~ 750 • Peak performance: 1 teraops • NuTech Sciences–Mining the Genome • IBM p640 System • Number of processors: ~ 5,000 • Peak performance: 7½ teraops • Total memory: 2½ terabytes • Total disk storage: 50 terabytes
GenomicsMeeting the Information Challenge Data Storage Network Grid Middleware Computers
North CarolinaResearch and Education Network Elizabeth City Winston Salem Boone Greensboro Rocky Mount RTP Asheville Greenville Fayetteville Cullowhee Charlotte Pembroke RTP RPoP Morehead City NCCU Wilmington Duke • NCREN3 • Increased bandwidth • Increased reliability • Increased resiliency NCSU Qwest MCNC NCSU Centennial Campus UNC-CH
Grid Technologies • Major New Computing Technology • Under development since mid-1990s • Distinguishing Characteristics • “Middleware” to support efficient resource sharing in a distributed, heterogeneous computing and data storage environment • Focus on use of large-scale computing and data storage • Some Major Grid Efforts • NASA IPG—Testbed linking selected NASA centers • DataGrid—International Grid being developed for high-energy physics (CERN)
Grid Technologies (cont’d) • Some Major Grid Efforts (cont’d) • GriPhyN—Research in Grid technologies for physics applications (Argonne, Florida) • e-Science Grid—Major effort in UK to develop a Grid infrastructure for science and engineering research • BIRN—Data Grid focused on neuroimaging data (UCSD, SDSC)
North CarolinaGenomics and Bioinformatics Consortium • Goal • Provide a venue for Consortium members to share information and resources, plan strategic initiatives, and form alliances • Distributed Across North Carolina • Concentration in Research Triangle, but extends across all of North Carolina • Diverse Goals and Expertise • Human health, including animal models; agriculture and forestry; evolutionary biology basic research; tool development
Overall NC BioGrid Architecture Grid-aware, -enabled bioinformatics applications … BioApp #1 BioApp #2 BioApp #3 Grid Middleware Globus, Legion, … Network NCREN3 NCSC plus Member’s Computing Centers Computing and Data Resources
NC BioGrid Project • Two Phases • Testbed Phase—test existing middleware, resolve issues, prepare detailed plan (12-18 months) • Production Phase—create and operate NC BioGrid • Funding for Testbed from MCNC • Project Manager • Phil Emer, MCNC, Chief Architect/NC BioGrid • Project Oversight • MCNC Board of Directors • HPCC Advisory Board • NC BioGrid Technical Advisory Group