510 likes | 754 Vues
Introduction to Computational Biosciences and Bioinformatics. Alex Ropelewski ropelews@psc.edu Pittsburgh Supercomputing Center National Resource for Biomedical Supercomputing. Developing Bioinformatics Programs. Guest Lectures : Alex Ropelewski – Bioinformatics/CS (NRBSC/PSC)
E N D
Introduction to Computational Biosciences and Bioinformatics Alex Ropelewski ropelews@psc.edu Pittsburgh Supercomputing Center National Resource for Biomedical Supercomputing These materials were developed with funding from the US National Institutes of Health grant #2T36 GM008789 to the Pittsburgh Supercomputing Center
Developing Bioinformatics Programs • Guest Lectures : • Alex Ropelewski – Bioinformatics/CS (NRBSC/PSC) • Hugh Nicholas – Bioinformatics/Chemistry (NRBSC/PSC) • Troy Wymore- Structural Biology/Chemistry (NRBSC/PSC) • Ricardo Gonzalez Mendez – Phylogenetics/Chemistry (UPR Medical Sciences) • Evaluation: • Jimmy Torres – Evaluation (UPR) • Internship program: • marc.psc.edu
Computational Biosciences The application of computer science, engineering, physical science and mathematics to the way in which plants, animals and humans function
Bioinformatics Structural biology Genetic databases Image processing Quantitative ecology Physiological modeling Medical informatics Scientific visualization Medical imaging Biomedical instrumentation Biomathematics Signal processing Telemedicine Biomedical engineering Other related areas Computational Bioscience Fields
Computational Bioscience Employment • Span several economic sectors including agriculture, pharmaceuticals, software, medicine, academia, and government. • Are among the fastest growing jobs. • Command salaries greater than the national average. • More recession proof than average.
Computational Biosciences Job Growth Engineers, Life and Physical Scientists and Related Occupations. Occupational Outlook Handbook, 2008-09 Edition. Department of Labor, Bureau of Labor Statistics
Computational Biosciences Salaries National Occupational Employment and Wage Estimates Department of Labor, Bureau of Labor Statistics, May 2007
Computational Biosciences • Interdisciplinary skills are required • Require knowledge in the following areas: • Biology • Chemistry • Computer Science • Mathematics • Statistics • Physics • Engineering
Computational Biosciences Required Skill Sets • Agricultural and food scientists need “…the ability to apply statistical techniques, and the ability to use computers to analyze data and to control biological and chemical processing.” • Biological scientists “…usually study allied disciplines such as mathematics, physics, engineering and computer science. Computer courses are beneficial for modeling and simulating biological processes, operating some laboratory equipment and performing research in the emerging field of bioinformatics” • “Computer skills are essential for prospective environmental scientists and hydrologists. Students who have some experience with computer modeling, data analysis and integration, digital mapping, remote sensing and Geographic Information Systems will be the most prepared to enter the job market” • Medical scientists “in addition to required courses in chemistry and biology undergraduates should study allied disciplines such as mathematics, engineering, physics, and computer science…” Engineers, Life and Physical Scientists and Related Occupations. Occupational Outlook Handbook, 2008-09 Edition. Department of Labor, Bureau of Labor Statistics
Computational Biosciences Required Skill Sets • “Developments in the field of Chemistry that involve life sciences will expand, resulting in more interaction among biologists, engineers, computer specialists and chemist.” Chemistry majors “usually study biological sciences; mathematics; physics; and increasingly computer science. Computer courses are essential because employers prefer job applicants who are able to apply computer skills to modeling and simulation tasks and operate computerized laboratory equipment. This is increasingly important as combinatorial chemistry and advanced screening techniques are more widely applied. Courses in statistics are useful because chemists… need the ability to apply basic statistical techniques.” “Chemists should experience employment growth in pharmaceutical and biotechnology research as recent advances in genetics open new avenues of treatment for diseases…. Job growth for chemists is expected to be strongest in pharmaceutical and biotechnology firms.” Engineers, Life and Physical Scientists and Related Occupations. Occupational Outlook Handbook, 2008-09 Edition. Department of Labor, Bureau of Labor Statistics
Who Employs Computational Bioscientists? • Government • NIH (many institutes including NLM, NCBI, NCI, CDC) • DOE (National labs) • Department of Defense (including Army Corps of Engineers) • Agriculture, Veterans Affairs, NSF • Government Contractors (such as Computercraft, SRA) • Pharmaceuticals & Biotechnology (Bayer, Schering-Plough, Amgen, Merck, Eli Lilly, etc,) • Hospitals (particularly research hospitals) • Agriculture (Monsanto, Pioneer, etc.) • Academia
Bioinformatics and Sequence Analysis Bioinformatics The interdisciplinary science of using computational approaches to analyze, classify, collect, represent and store biological data with the goal of accelerating and enhancing the understanding of DNA, RNA and Protein sequences. Sequence Analysis Process of applying computational methods to a biological molecule represented as a character string. The goal is to infer information about the structure, function, or evolutionary history of the sequence.
What is a Sequence? • A sequence is a way to represent a protein, DNA, or RNA molecule as a character string. Phospholipase A2 - Bos taurus (Bovine). MRLLVLAALLTVGAGQAGLNSRALWQFNGMIKCKIPSSEPLLDFNNYGCYCGLGGSGTPV DDLDRCCQTHDNCYKQAKKLDSCKVLVDNPYTNNYSYSCSNNEITCSSENNACEAFICNC DRNAAICFSKVPYNKEHKNLDKKNC
Molecular Alphabet • DNA/RNA Sequences: Letters represent side chains or bases: • A - Adenine • C - Cytosine • G - Guanine • T - Thymine (DNA) • U - Uracil (RNA) • X or N (Unknown)
A - Alanine R - Arginine N - Asparagine D - Aspartic acid C - Cysteine E - Glutamic acid Q - Glutamine G - Glycine H - Histidine I - Isoleucine L - Leucine K – Lysine M - Methionine F - Phenylalanine P - Proline S - Serine T - Threonine W - Tryptophan Y - Tyrosine V - Valine B - Asparagine or aspartic acid Z - Glutamine or glutamic acid J - Leucine or Isoleucine X - Any Amino Acid U - Selenocysteine O - Pyrrolysine Molecular Alphabet • Protein Sequences: Letters represent amino acids:
Why study families of sequences? • Families share a common function, structure, and are related through evolution Aldehyde Dehydrogenase Family Members
The Goal CURATED FAMILY: • All related sequences sharing a common function (Homologous Sequences) • All substantial motifs • Evolutionary history • Structural information • Experimental information 17 These materials were developed with funding from the US National Institutes of Health grant #2T36 GM008789 to the Pittsburgh Supercomputing Center
The Process Structural Libraries Evolutionary Analysis Hidden Markov Model Classification Libraries Multiple Sequence Alignment Initial Query Profile & PSSM Sequence Libraries Local Patterns Homology Modeling CURATED DATASET 18 These materials were developed with funding from the US National Institutes of Health grant #2T36 GM008789 to the Pittsburgh Supercomputing Center
The Toolkit GenBank Blast Clustalw Meme EMBL Fasta T-Coffee Mast UniProt Smith-Waterman MSA hmmer Pfam Needleman-Wunsch Probcons Profile-ss PDB Figtree Phylip Notung PDB Python BioPython Genedoc
The Toolkit Which is the proper tool for the task?
The Process Structural Libraries Evolutionary Analysis Hidden Markov Model Classification Libraries Multiple Sequence Alignment Initial Query Profile & PSSM Sequence Libraries Local Patterns Homology Modeling CURATED DATASET 21 These materials were developed with funding from the US National Institutes of Health grant #2T36 GM008789 to the Pittsburgh Supercomputing Center
What is an Information Library? • A compilation of prior experimental knowledge about biologically relevant molecules into a computer system. • Bioinformatics power is in the ability to leverage and apply this prior experimental knowledge to additional biological problems. • From a biologists prospective, there are different ways that we can organize this prior experimental knowledge: • Sequence • Structure • Family/Domain • Species • Taxonomy • Function/Pathway • Disease/Variation • Publication Journal • And many other ways
What Question Are You Trying To Answer? 23 These materials were developed with funding from the US National Institutes of Health grant #2T36 GM008789 to the Pittsburgh Supercomputing Center
Sequence Libraries – Results Searching with a Single Sequence 24 These materials were developed with funding from the US National Institutes of Health grant #2T36 GM008789 to the Pittsburgh Supercomputing Center
Sequence Libraries – ResultsSearching with an Abstract Representation 25 These materials were developed with funding from the US National Institutes of Health grant #2T36 GM008789 to the Pittsburgh Supercomputing Center
The Process Structural Libraries Evolutionary Analysis Hidden Markov Model Classification Libraries Multiple Sequence Alignment Initial Query Profile & PSSM Sequence Libraries Local Patterns Homology Modeling CURATED DATASET 26 These materials were developed with funding from the US National Institutes of Health grant #2T36 GM008789 to the Pittsburgh Supercomputing Center
Multiple Sequence Alignment 28 These materials were developed with funding from the US National Institutes of Health grant #2T36 GM008789 to the Pittsburgh Supercomputing Center
The Process Structural Libraries Evolutionary Analysis Hidden Markov Model Classification Libraries Multiple Sequence Alignment Initial Query Profile & PSSM Sequence Libraries Local Patterns Homology Modeling CURATED DATASET 35 These materials were developed with funding from the US National Institutes of Health grant #2T36 GM008789 to the Pittsburgh Supercomputing Center
Homology Modeling – Why it works Structural Superposition of Aldehyde Dehydrogenase Family Members 36 These materials were developed with funding from the US National Institutes of Health grant #2T36 GM008789 to the Pittsburgh Supercomputing Center
Job Opportunities in Bioinformatics • This course will teach you many essential skills that are asked for in these job postings. • Let’s look at actual job postings asking for bioinformatics expertise: • Not all jobs will be labeled “bioinformatics” or “sequence analysis”; many are in a related computational bioscience field. • Specific skills required
Summer Internship-Computational Biology • Qualifications:To be eligible for a Computational Biology Summer Scientific Internship students will have completed their undergraduate Sophomore year (by June 2009) • Be majoring in a biological, chemistry or computer science program. • Candidates would have completed at least one programming course before the start of the internship. • All interns must have current authorization to work for any employer within the United States. • Experience with MatLab, SQL, C++ and/or PERL experience is desired. http://jobview.monster.com/getjob.aspx?JobID=78206043&JobTitle=Summer+Internship-Computational+Biology&q=computational+biology&cy=us&lid=316&re=0&pg=1&dv=1&AVSDM=2008-12-18+14%3a20%3a00&seq=2&fseo=1&isjs=1&re=1000
Molecular Biology Lab Technician Responsibilities: • Performing a wide variety of complex procedures and techniques which may include, but are not limited to: PCR amplification of gene sequences, design of synthetic genes, gene cloning, recombinant protein expression and analysis, DNA ligation, bacterial and yeast transformations, microbial cell culture and basic manipulations, gel electrophoresis, DNA analysis, DNA isolation, DNA purification, DNA sequencing, DNA gel purification, RNA isolation and analysis, protein purification and analysis, western blots, PAGE, column chromatography. Calibrating and operating a variety of laboratory instruments used to perform tests, analyses, and other laboratory procedures which may include, but are not limited to: spectrophotometers, PCR thermocyclers, microcentrifuges, electrophorators, hybridization ovens, autoclaves, ultracentrifuges, radioactive counters, cytofluograph, light/tissue culture microscopes, imaging systems, DNA sequencers, high voltage electrophoresis apparatus, gel chromatography equipment, balances, Bioanalyzer, incubators, waterbaths etc. Writing report(s) of findings, stating methods and procedures, including any modifications employed, specimens and materials involved, and results of experiments; Interpreting, evaluating, and discussing the results of each experiment with the supervisor as part of the planning process for subsequent studies;Reviewing and researching the proper handling and methods of utilization for different biological and chemical materials in regular use in the laboratory;Preparing various media, stock supplies, and other reagents for use in the laboratory. Responsible for quality control of the laboratory including upkeep and routine maintenance of equipment. Orders and maintains laboratory supplies and equipment including chemicals, biological materials, liquid nitrogen, etc. May be required to handle biological materials, radioactive materials, carcinogens, toxins, infectious agents, recombinant DNA materials, acids, biological wastes, etc. • Candidate must possess sound practical knowledge of computer-based tools for sequence, genome and protein analysis. Lab management duties will include database and inventory management of molecular biology reagents such as vectors, clones, primers and cell lines. May be required to perform computer analysis of data and use computer and computer software to prepare data for publication. • Performs other duties as assigned by the supervisor, such as monitoring chemical and radiological hazards, conducting and reporting operational radiation surveys, etc. Qualifications: • B.S. degree in molecular biology with a minimum of two to three years related experience. • Professional knowledge of the theories, principles, and methods of a scientific specialty or subspecialty and related disciplines, such as biology, microbiology, chemistry, molecular biology, cell biology, immunology, virology, genetics, biochemistry, etc. • Skill in calibrating and operating a variety of laboratory instruments used to perform tests, analyses, and other laboratory procedures. • Experience in recombinant protein expression ideally in Pichia pastoris, Saccharomyces, and/or E. coli preferred, but not required. • Computer based sequence/protein analysis experience also preferred. • Fermentation experience preferred. • Excellent written communication skills to prepare summaries, graphs, and other material for manuscripts or presentations. http://jobview.monster.com/getjob.aspx?JobID=78681392&JobTitle=Molecular+Biology+Lab+Technician&brd=1&pg=1&pp=100&dv=1&q=bioinformatics&cy=us&lid=316&lv=12&re=6&AVSDM=2009-01-15+15%3a05%3a00&seq=57&fseo=1&isjs=1&re=1000
Bioinformatics Assembly Analyst Responsibilities: • assembling genome sequence data using a variety of tools and parameters and performing the experiments needed to evaluate sequencing strategies • using existing software and databases to analyze genomic data and correlating assemblies and sequences with a variety of genetic and physical maps and other biological information • identifying problems and serving as point of contact for various groups to propose and implement solutions • proposing and implementing upgrades to existing tools and processes to enhance analysis techniques and quality of results • developing and implementing scripts to manipulate, format, parse, analyze, and display genome sequence data; and developing new strategies for analysis and presentation of results. Requirements: • a bachelor's degree in biology or related field • at least three years of experience in DNA sequencing and sequence analysis. • Must possess solid knowledge of sequencing software and public sequencing databases. • Knowledge of bioinformatics tools helpful. http://sh.webhire.com/servlet/av/jd?ai=631&ji=2285147&sn=I
Bioinformatics: Position Requirements: · Proven critical-thinking skills and demonstrated ability to manage and interpret large biological data sets · Demonstrated knowledge of genetics and/or molecular biology (e.g., successful college-level coursework) · Proficiency in computer-based DNA and protein sequence analysis using public biological databases · Experience using the Macintosh OS X and Windows operating system · Working knowledge of the Linux or Unix operating system · Basic knowledge of relational database structure and terminology · Proficiency with Microsoft Excel or other spreadsheet applications · Strong interpersonal and communication skills with a confident and cooperative service-oriented attitude · Fluency in both written and spoken English Highly qualified candidates will possess: · Demonstrated experience in analyzing gene expression (microarray) data · Experience in analyzing data from whole-genome association studies (e.g., SNP data) · Laboratory research experience in genetics, molecular biology, or biochemistry · The ability to be self-motivated and work independently, with minimal supervision Education Requirements: · Bachelor’s or Master’s Degree in biology, bioinformatics, or a related field http://jobview.monster.com/getjob.aspx?JobID=78680407&JobTitle=Bioinformatics+Data+Analyst+at+NIH+%2f+NHGRI&brd=1&q=bioinformatics&cy=us&lid=316&re=130&AVSDM=2009-01-15+14%3a31%3a00&pg=1&seq=2&fseo=1&isjs=1&re=1000
Bioinformatics Analyst: Responsibilities: • The Bioinformatics Analyst will process sequence data and apply quality control measures for generating high quality raw sequence and assembled data from next generation sequencing technologies. • Will perform whole genome alignments using existing alignment tools, including BLAST, mummer and patternhunter Perform mapping and post-mapping analysis with short reads using third-party and internally developed tools. • Responsible for receiving, processing and managing sequence data. • Evaluate new methodologies and tools and improve data processing and quality control protocols. • Develop suitable metrics for reporting the completeness and quality of the sequence delivered to the customers. Requirements: • B.S. in biology, computer science, bioinformatics or related field, or equivalent combination of education and experience • A minimum of 2 years experience in genomics and bioinformatics-related work. • Proficiency in Unix and experience in one or more of these programming languages -perl, SQL, jython and java is required. • Familiar with the use of commonly-used sequence analysis tools and genomic databases • Willing to multi-task and respond to new challenges as required. • Excellent communication skills. • Hands-on experience in a research or production environment http://jobview.monster.com/getjob.aspx?JobID=78527133&JobTitle=Bioinformatics+Analyst&brd=1&q=bioinformatics&cy=us&lid=316&re=130&AVSDM=2009-01-09+12%3a56%3a00&pg=1&seq=11&fseo=1&isjs=1&re=1000
Business Systems Analyst: Responsibilities • The ideal candidate should be a highly motivated team player with a strong understanding of informatics solutions to biology and chemistry, especially in the area of data visualization/statistical analysis and with proven record of building/integrating effective tools for scientists to help them in their daily work. • Actively work with scientists/computational biologists in a disease area to understand their needs • Define proper data analysis solution(s) to meet their scientific needs • Perform rapid prototyping to refine the requirements with proper documentation • Work with internal and external software teams, where appropriate to design/implement proper solutions to meet scientists' needs • Work either as a team member or lead a team to deliver data analysis platforms to scientists/computational biologists • Work effectively with different NITAS groups to ensure a globally consistent implementation scheme. Requirements: • Bachelor's degree in computer science, Biology, Bioinformatics or comparable qualification • At least 3-5 years hands-on experience on data analysis in a drug discovery, scientific or biotech environment • Strong communications and interpersonal skills • Proven capabilities interacting with scientists and being customer service oriented • Ability to work independently and/or as part of a team • Familiarity with scientific LIMS such as ActivityBase, and data visualization/analysis tools such as Spotfire • Solid understanding of relational databases and familiarity with Oracle and/or SQL server • Good understanding in fundamentals of software engineering.
Bioinformatics: Responsibilities: • Work in a team environment with software developers, bioinformaticians, molecular biologists, and neuroscientists to design, implement, and support novel software tools for the analysis of sequence and gene expression data using custom and public databases. • Responsible for designing and developing a web-accessible database of sequence data, and providing computational contributions to other scientific projects. • This position requires someone with software development experience who thrives in a team environment, is detail-oriented, and enjoys collaborative research projects. Requirements: • BS/MS/PhD in Computer Science, Computational Biology, Biology or related field. • Minimum two years of java development. • Experience working closely with scientists in a wet-lab environment. • Demonstrated experience with algorithm design and implementation is essential. • A good understanding of molecular biology, genetics, neuroscience, genomics or computational biology. • Knowledge in Ajax and SQL is a plus. • Experience in web page development using HTML, XML, javascript and other web technologies such as JSP, Struts, and Tomcat is preferred. • Detail-oriented • Enjoys collaborative research projects http://jobview.monster.com/getjob.aspx?JobID=78304360&JobTitle=Bioinformatics+Specialist+I&brd=1&pg=1&pp=25&dv=1&q=bioinformatics&cy=us&lid=316&re=132&lv=12&AVSDM=2009-01-12+02%3a15%3a00&seq=11&fseo=1&isjs=1&re=1000
Research Associate: Responsibilities: • Develop and optimize methods and reagents for human blood sample preparation and nucleic acid amplification for molecular diagnostic assays on SEQUENOM’s instrument system • Be responsible for routine clinical sample handling and processing, as well as plan and execute validation and verification studies • Serve as assay development specialist and a resource within the company • Perform experimental setups requiring DOEs (Design of Experiment) and data analysis requiring computational capabilities using appropriate statistic software • Analyze, report and present study data in team meetings • Achieves objectives as set forth by supervisor and /or project timeline • Analyze data and prepare technical reports, summaries, protocols and quantitative analysis and ensure experiments are well documented • Report and present study data in team meetings • Maintain familiarity with current scientific literature Requirements: • B. S. in Molecular Biology/Biochemistry, or closely related field, with minimum of 1 year of full time laboratory experience. Experience in DNA/RNA extraction for clinical diagnostics applications is a plus • Knowledge of human nucleic acid extraction, amplification and detection technologies and hands-on experience of at least one of these technologies • Knowledge in DOE and statistical data analysis tools like Excel, JMP or other statistics software • Detail oriented analytical and good troubleshooting capabilities • Proficiency in organization, documentation, and communication • A team player, willing to help/support others to ensure the success of projects • Proficient computer skills including Microsoft Office software http://jobview.monster.com/getjob.aspx?JobID=78050634&JobTitle=Research+Associate+%282+positions%29&q=computational+biology&cy=us&lid=316&pg=3&dv=1&pp=25&re=4&AVSDM=2008-12-10+21%3a03%3a00&seq=13&fseo=1&isjs=1&re=1000
Summary • Wide variety of jobs • Biology, especially molecular biology and genetics • Some statistics • Computer skills: • UNIX • Bioinformatics Tools • Database (SQL) • Some Programming • Web • Bioinformatics can be a rewarding career path