190 likes | 201 Vues
Explore the intricate web of genetic and protein interactions, integrating complex biological data for comprehensive models. Addressing evolutionary constraints, experimental robustness, and tools efficacy. Dive into the complex relations and develop efficient algorithms for diverse system variables.
E N D
Challenges for computer scienceas a part of Systems Biology Benno SchwikowskiInstitute for Systems BiologySeattle, WA
Towards integrative models Species Conditions/time Genes • DNA • Sequence • Genomic locus • Domain content • Intron/exon structure • Regulatory motifs • Chemical modifications • SNPs - Splice variants- Accessibility • Variation • mRNA • Abundance- Regulatory information- initiation/ termination signals • Proteininteraction • Interaction partner • Direct/indirect- Affinity • Effect • Protein- Abundance- State • Localization • 3D structure • Functional characterization • Half-life • Active sites • Biochemical function- Cellular role Benno Schwikowski
Challenge: Integrative models …Across genes and proteins: Many genes involved (e.g., multifactorial diseases) • …Across model systems: Lack of experimental platforms in target system • …Across levels of biological organization(e.g. gene regulatory processes involving phosphorylation) • …Across experiments: Robustness against errors in mass spectrometry, mRNA measurements • …Across timescales Benno Schwikowski
Challenge: Capturing evolutionary constraints DNA RNA Proteins Modules Organelles Cells Organs Individuals Populations Ecologies "Nothing in biology makes sense except in the light of evolution.“ Theodosius Dobzhansky Benno Schwikowski
Challenge: Choosing experiments • Machine LearningDetermine most likely classification/parameterization on the basis of a randomly sampled dataset • Active LearningAllow an algorithm to query selected data points, using the result of previous queries. Benno Schwikowski
Challenge: Relations between system variables can be quite complex Yuh, Bolouri, Davidson, Science, 1998 Benno Schwikowski
Challenge: Relations between system variables can be quite complex Yuh, Bolouri, Davidson, Science, 1998 Benno Schwikowski
Challenge: Develop models that allow extremely efficient algorithms AGTCGTACGTGAC... AGTAGACGTGCCG... ACGTGAGATACGT... GAACGGAGTACGT... TCGTGACGGTGAT... Benno Schwikowski
CLUSTALW(1.74) multiple sequence alignment Cotton ACGGTT-TCCATTGGATGA---AATGAGATAAGAT---CACTGTGC---TTCTTCCACGTG--GCAGGTTGCCAAAGATA-------AGGCTTTACCATT Pea GTTTTT-TCAGTTAGCTTA---GTGGGCATCTTA----CACGTGGC---ATTATTATCCTA--TT-GGTGGCTAATGATA-------AGG--TTAGCACA Tobacco TAGGAT-GAGATAAGATTA---CTGAGGTGCTTTA---CACGTGGC---ACCTCCATTGTG--GT-GACTTAAATGAAGA-------ATGGCTTAGCACC Ice-plant TCCCAT-ACATTGACATAT---ATGGCCCGCCTGCGGCAACAAAAA---AACTAAAGGATA--GCTAGTTGCTACTACAATTC--CCATAACTCACCACC Turnip ATTCAT-ATAAATAGAAGG---TCCGCGAACATTG--AAATGTAGATCATGCGTCAGAATT--GTCCTCTCTTAATAGGA-------A-------GGAGC Wheat TATGAT-AAAATGAAATAT---TTTGCCCAGCCA-----ACTCAGTCGCATCCTCGGACAA--TTTGTTATCAAGGAACTCAC--CCAAAAACAAGCAAA Duckweed TCGGAT-GGGGGGGCATGAACACTTGCAATCATT-----TCATGACTCATTTCTGAACATGT-GCCCTTGGCAACGTGTAGACTGCCAACATTAATTAAA Larch TAACAT-ATGATATAACAC---CGGGCACACATTCCTAAACAAAGAGTGATTTCAAATATATCGTTAATTACGACTAACAAAA--TGAAAGTACAAGACC Cotton CAAGAAAAGTTTCCACCCTC------TTTGTGGTCATAATG-GTT-GTAATGTC-ATCTGATTT----AGGATCCAACGTCACCCTTTCTCCCA-----A Pea C---AAAACTTTTCAATCT-------TGTGTGGTTAATATG-ACT-GCAAAGTTTATCATTTTC----ACAATCCAACAA-ACTGGTTCT---------A Tobacco AAAAATAATTTTCCAACCTTT---CATGTGTGGATATTAAG-ATTTGTATAATGTATCAAGAACC-ACATAATCCAATGGTTAGCTTTATTCCAAGATGA Ice-plant ATCACACATTCTTCCATTTCATCCCCTTTTTCTTGGATGAG-ATAAGATATGGGTTCCTGCCAC----GTGGCACCATACCATGGTTTGTTA-ACGATAA Turnip CAAAAGCATTGGCTCAAGTTG-----AGACGAGTAACCATACACATTCATACGTTTTCTTACAAG-ATAAGATAAGATAATGTTATTTCT---------A Wheat GCTAGAAAAAGGTTGTGTGGCAGCCACCTAATGACATGAAGGACT-GAAATTTCCAGCACACACA-A-TGTATCCGACGGCAATGCTTCTTC-------- Duckweed ATATAATATTAGAAAAAAATC-----TCCCATAGTATTTAGTATTTACCAAAAGTCACACGACCA-CTAGACTCCAATTTACCCAAATCACTAACCAATT Larch TTCTCGTATAAGGCCACCA-------TTGGTAGACACGTAGTATGCTAAATATGCACCACACACA-CTATCAGATATGGTAGTGGGATCTG--ACGGTCA Cotton ACCAATCTCT---AAATGTT----GTGAGCT---TAG-GCCAAATTT-TATGACTATA--TAT----AGGGGATTGCACC----AAGGCAGTG-ACACTA Pea GGCAGTGGCC---AACTAC--------------------CACAATTT-TAAGACCATAA-TAT----TGGAAATAGAA------AAATCAAT--ACATTA Tobacco GGGGGTTGTT---GATTTTT----GTCCGTTAGATAT-GCGAAATATGTAAAACCTTAT-CAT----TATATATAGAG------TGGTGGGCA-ACGATG Ice-plant GGCTCTTAATCAAAAGTTTTAGGTGTGAATTTAGTTT-GATGAGTTTTAAGGTCCTTAT-TATA---TATAGGAAGGGGG----TGCTATGGA-GCAAGG Turnip CACCTTTCTTTAATCCTGTGGCAGTTAACGACGATATCATGAAATCTTGATCCTTCGAT-CATTAGGGCTTCATACCTCT----TGCGCTTCTCACTATA Wheat CACTGATCCGGAGAAGATAAGGAAACGAGGCAACCAGCGAACGTGAGCCATCCCAACCA-CATCTGTACCAAAGAAACGG----GGCTATATATACCGTG Duckweed TTAGGTTGAATGGAAAATAG---AACGCAATAATGTCCGACATATTTCCTATATTTCCG-TTTTTCGAGAGAAGGCCTGTGTACCGATAAGGATGTAATC Larch CGCTTCTCCTCTGGAGTTATCCGATTGTAATCCTTGCAGTCCAATTTCTCTGGTCTGGC-CCA----ACCTTAGAGATTG----GGGCTTATA-TCTATA Cotton T-TAAGGGATCAGTGAGAC-TCTTTTGTATAACTGTAGCAT--ATAGTAC Pea TATAAAGCAAGTTTTAGTA-CAAGCTTTGCAATTCAACCAC--A-AGAAC Tobacco CATAGACCATCTTGGAAGT-TTAAAGGGAAAAAAGGAAAAG--GGAGAAA Ice-plant TCCTCATCAAAAGGGAAGTGTTTTTTCTCTAACTATATTACTAAGAGTAC Larch TCTTCTTCACAC---AATCCATTTGTGTAGAGCCGCTGGAAGGTAAATCA Turnip TATAGATAACCA---AAGCAATAGACAGACAAGTAAGTTAAG-AGAAAAG Wheat GTGACCCGGCAATGGGGTCCTCAACTGTAGCCGGCATCCTCCTCTCCTCC Duckweed CATGGGGCGACG---CAGTGTGTGGAGGAGCAGGCTCAGTCTCCTTCTCG Benno Schwikowski
Challenge: Developing models that allow extremely efficient algorithms AGTCGTACGTGAC... AGTAGACGTGCCG... ACGTGAGATACGT... GAACGGAGTACGT... TCGTGACGGTGAT... ACGT ACGT ACGT ACGG Parsimony score: 1 J. Comp Biol. 2002 Benno Schwikowski
An Exact Algorithm(generalizing Sankoff and Rousseau 1975) Wu [s] = min ( Wv [t] + d(s, t) ) v:child t of u … ACGG: + ACGT: 0 ... …ACGG: ACGT :0... …ACGG:ACGT :0... …ACGG:ACGT :0 ... … ACGG: 1 ACGT: 0 ... … ACGG: 2 ACGT: 1... … ACGG: 1ACGT: 1 ... … ACGG: 0ACGT: 2 ... Wu [s] = best parsimony score for subtree rooted at node u, if u is labeled with string s. 4k entries AGTCGTACGTG ACGGGACGTGC ACGTGAGATAC GAACGGAGTAC TCGTGACGGTG … ACGG: 0 ACGT: +... J. Comp Biol. 2002 Benno Schwikowski
What are good challenges to tackle? • Biological/medical questions asked • Experimental technologies to acquire a lot of relevant data • Available datasets with a formalized notion of “data quality” Benno Schwikowski
Memory complexity: O(k 42k ) per node Average sequence length Number of species Time complexity: Total time O(nk(42k + l )) Motif length J. Comp Biol. 2002 Benno Schwikowski
Technology-based challenges:Universal DNA Tag Systems • Existing applications in high-throughput technologies • Universal DNA arrays • Padlock probes • LYNX mRNA technology
Formalization Define: weight(A/T)=1, weight(C/G)=2 weight(AACTTG) = 1+1+2+1+1+2 = 8 melting temperature (AACTTG) = 2·weight l-ucode problemGiven two integers, l < u, find the largestset of tags such that Each tag has weight uEach string of weight l occurs at most once J. Comp Biol. 2000 & 2003
Challenge: Visualization Andrea Weston et al.@ ISB & Cytoscape Benno Schwikowski
Challenge: Visualization Cytoscape, pre-release 2.0 Benno Schwikowski
A computer scientist’s perspective “Biology is so digital, and incredibly complicated […] I can't be as confident about computer science as I can about biology. Biology easily has 500 years of exciting problems to work on, it's at that level.” Donald Knuth, 7 Dec 1993 Donald Knuth Benno Schwikowski