210 likes | 310 Vues
This informative piece covers Minimum Spanning Trees (MST), including famous algorithms like Kruskal’s and Prim’s, along with the history and variants of MST. Additionally, it delves into Sequence Analysis, discussing various alignment methods and tools such as FASTA and falign. The text also touches on finding maximum-sum or maximum-average regions with length constraints, as well as the MAvg algorithm for locating k-best average regions in sequences. Loaded with insights into these important topics in computer science.
E N D
Never-ending stories Kun-Mao Chao (趙坤茂) Dept. of Computer Science and Information Engineering National Taiwan University, Taiwan E-mail: kmchao@csie.ntu.edu.tw WWW: http://www.csie.ntu.edu.tw/~kmchao
Minimum spanning trees (MST) • Input : weighted graph G=(V,E) • Output: A subset of E of minimum weight which forms a tree on V. • Two famous textbook algorithms: • Kruskal’s algorithm (1956) O (|E| log |E|) • Prim’s algorithm (1957) O(|E| log |V|)
The history of MST • Boruvka algorithm (1926) O(|E| log |V|) • Jarnik’s algorithm (1930) O(|E| log |V|),Rediscovered by • Prim (1957) • Dijkstra (1959)
Improvements • Yao (1975) O(|E| log log |V|) • Cheriton and Tarjan (1976) O(|E| log log |V|) • ... • Karger, Klein and Tarjan (1995) Randomized O(|E|) • Chazelle (2000) O(|E|.α(|E|, |V|)) • Pettie and Ramachandran (2002)An optimal MST algorithm Ω(|E|) ~ O(|E|.α(|E|, |V|))
Some Variants of weighted spanning trees • The Minimum Routing Cost Spanning Tree Problem (MRCT): to minimize the sum over all pairs of vertices of the cost of the path between the pair in the tree. • NP-hard (Johnson, Lenstra and Rinnooy Kan, 1978) • 2-approximation (Wong, 1980) • 1.5-approximation (Wu, Chao and Tang, 1997) • PTAS (Wu, Lancia, Bafna, Chao, Ravi and Tang, 1998)
Chao, K. -M., Pearson, W. R. and Miller, W. , 1992, Aligning Two Sequences within a Specified Diagonal Band, Computer Applications in the Biosciences (CABIOS, now Bioinformatics), 8: 481-487. FASTA’s Last Stage
Chao, K. -M., Hardison, R. C. and Miller, W. , 1993, Constrained Sequence Alignment, Bulletin of Mathematical Biology, 55: 503-524. Band Arbitrary boundary lines
Chao, K. -M., Hardison, R. C. and Miller, W. , 1993, Locating Well-Conserved Regions within a Pairwise Alignment, Computer Applications in the Biosciences (CABIOS, now Bioinformatics), 9: 387-396. Robust Measures
Hardison, R. C., Chao, K. -M., Adamkiewicz, M., Price, D., Jackson, J., Zeigler, T., Stojanovic, N. and Miller, W. , 1993, Positive and Negative Regulatory Elements of the Rabbit Embryonic -Globin Gene Revealed by an Improved Multiple Alignment Program and Functional Analysis, DNA Sequence, 4: 163-176. Hardison, R. C., Chao, K. -M., Schwartz, S., Stojanovic, N., Ganetsky, M. and Miller, W. , 1994, Globin Gene Server: A Prototype E-Mail Database Server Featuring Extensive Multiple Alignments and Data Compilation for Electronic Genetic Analysis, Genomics, 21: 344-353. Multiple alignment applications
Chao, K. -M., Hardison R. C. and Miller, W. , 1994, Recent Developments in Linear-Space Alignment Methods: a Survey, Journal of Computational Biology, 1: 271-291. YAMA (Yet Another Multiple Aligner)
Chao, K. -M. and Miller, W. , 1995, Linear-Space Algorithms that Build Local Alignments from Fragments, Algorithmica, 13: 106-134. falign: Somewhere between FASTA and BLAST
Chao, K. -M., Zhang, J., Ostell, J. and Miller, W. , 1995, A Local Alignment Tool for Very Long DNA Sequences, Computer Applications in the Biosciences (CABIOS, now Bioinformatics), 11: 147-153. falign + constrained sequence alignment
Chao, K. -M., Zhang, J., Ostell, J. and Miller, W. , 1997, A Tool for Aligning Very Similar DNA sequences, Computer Applications in the Biosciences (CABIOS, now Bioinformatics), 13: 75-80. Fast algorithms for very similar sequences
Chao, K. -M., 1998, “On Computing all Suboptimal Alignments,” Information Sciences, 105: 189-207. Suboptimal alignments
Chao, K. -M., 1999, “Calign: Aligning Sequences with Restricted Affine Gap Penalties,” Bioinformatics, 15: 298-304. cDNA vs. Genomic sequences
Lin, Y. -L., Jiang, T. and Chao, K. -M., 2002, “Efficient Algorithms for Locating the Length-Constrained Heaviest Segments, with Applications to Biomolecular Sequence Analysis,” Journal of Computer and System Sciences (JCSS), Accepted. (Work done in October, 2001.) Algorithms for locating a maximum-sum or maximum-average region with length constraints.
Lin, Y. -L., Huang, X., Jiang, T. and Chao, K. -M., 2003, “MAVG: Locating Non-Overlapping Maximum Average Segments in a Given Sequence,” Bioinformatics, January issue. (Work done in April, 2002.) A tool for locating k-best average regions
Huang, X. and Chao, K. -M., 2003, “A Generalized Global Alignment Algorithm,” Bioinformatics, February issue. (Work done in May, 2002.) GAP3: Chaining local alignments
Part III.: Your stories (To be continued.)