720 likes | 734 Vues
An optimal algorithm for identifying a maximum-density segment. 呂學一 ( 中央研究院 資訊科學所 ) http://www.iis.sinica.edu.tw/~hil/. Microsoft Office XP is needed to see all the animation effects. What do algorithm people do?. Inventing efficient recipes to solve combinatorial problems.
E N D
An optimal algorithm for identifying a maximum-density segment 呂學一 (中央研究院 資訊科學所) http://www.iis.sinica.edu.tw/~hil/ Microsoft Office XP is needed to see all the animation effects. Maximum-Density Segment @ EE.NTU
What do algorithm people do? Inventing efficient recipes to solve combinatorial problems Maximum-Density Segment @ EE.NTU
A famous combinatorial problem • The Factorization Problem • Input: a number N • Output: • “yes” if N is a prime number; • A factorization of N if N is not a prime number. • For example, • N = 323264989793317. • Output = 18672511 * 17312347. Maximum-Density Segment @ EE.NTU
OPEN QUESTION Is there an efficient recipe for the Factorization Problem? Maximum-Density Segment @ EE.NTU
Why Factorization? The security of many encryption schemes is based upon the assumption that the factorization problem is difficult. Maximum-Density Segment @ EE.NTU
RSA encryption –– 1978 Rivest Shamir Adleman Maximum-Density Segment @ EE.NTU
RSA factorization challenges Maximum-Density Segment @ EE.NTU
US$10,000 –– RSA-576 • 1881988129206079638386972394616504398071635633794138270076335642298885971523466548531906060650474304531738801130339671619969232120573403187955065699621305168759307650257059 Maximum-Density Segment @ EE.NTU
RSA-576 factored in December 3, 2003 • 398075086424064937397125500550386491199064362342526708406385189575946388957261768583317 • 472772146107435302536223071973048224632914695302097116459852171130520711256363590397527 • At the same time, Adi Shamir gave two talks at NTU (Dec. 4, 2003 ) Maximum-Density Segment @ EE.NTU
US$20,000 –– RSA-640 • 3107418240490043721350750035888567930037346022842727545720161948823206440518081504556346829671723286782437916272838033415471073108501919548529007337724822783525742386454014691736602477652346609 Maximum-Density Segment @ EE.NTU
US$200,000 –– RSA-2048 • 25195908475657893494027183240048398571429282126204032027777137836043662020707595556264018525880784406918290641249515082189298559149176184502808489120072844992687392807287776735971418347270261896375014971824691165077613379859095700097330459748808428401797429100642458691817195118746121515172654632282216869987549182422433637259085141865462043576798423387184774447920739934236584823824281198163815010674810451660377306056201619676256133844143603833904414952634432190114657544454178424020924616515723350778707749817125772467962926386356373289912154831438167899885040445364023527381951378636564391212010397122822120720357 Maximum-Density Segment @ EE.NTU
Short of cash? www.rsasecurity.com/rsalabs/challenges/factoring/
RSA 2003 (April ’03) Maximum-Density Segment @ EE.NTU
2002 Turing Award (June’03) Maximum-Density Segment @ EE.NTU
The awarded paper • Only 7 pages. • “A Method for Obtaining Digital Signatures and Public Key Cryptosystems”, Communications of the ACM21, 120-126, 1978. Maximum-Density Segment @ EE.NTU
“PRIMES is in P”Agarwal, Kayal, and Saxena August 6, 2002 Maximum-Density Segment @ EE.NTU
PRIMES is in P • The PRIMES problem: • Input: a number N. • Output: • “yes” if N is a prime number. • “no” if N is not a prime number. • Only 9 pages! • Running time is O(n12), where n is the number of digits. Maximum-Density Segment @ EE.NTU
NEW YORK TIMES, Aug. 8, 2002 • Previous algorithmic results that caught the attention of the New York Times • 1984, Karmarkar’s algorithm for solving linear programs. • 1979, Khachian’s algorithm for solving linear programs. Maximum-Density Segment @ EE.NTU
The latest version (v.3) of AKS’s paper • The running time is now improved from O(n12) to O(n7.5). Maximum-Density Segment @ EE.NTU
What do algorithm people do? • Looking for important/interesting combinatorial problems • Coming up with efficient recipes to solve them exactly or approximately. Maximum-Density Segment @ EE.NTU
Bioinformatics • A gold mine of combinatorial problems Maximum-Density Segment @ EE.NTU
An example: My results Maximum-Density Segment @ EE.NTU
Finding a DNA segment with Max GC-density in linear time WABI J. Comput. Sys. Sci. ESA SIAM J. Computing Maximum-Density Segment @ EE.NTU
DNA Sequences • [Chargaff and Vischer, 1949] • DNA consisting of A, G, T, C • Adenine (腺嘌呤) • Guanine (鳥糞嘌呤) • Cytosine (胞嘧啶) • Thymine (胸腺嘧啶) Maximum-Density Segment @ EE.NTU
[Vischer, Zamenhof, Chargaff, 1949] • Negative evidences for the widely believed %A = %G = %T = %C. Maximum-Density Segment @ EE.NTU
Edwin Chargaff, 1905- • Observing • %A ~ %T • %G ~ %C • “A comparison of the molar proportions reveals certain striking, but perhaps meaningless, regularities” Maximum-Density Segment @ EE.NTU
Double Helix • [Watson and Crick, Nature, April 25, 1953] • Biologist (age 23, fresh Ph.D.) + Physicist (age 35, still a Ph.D. student) • 900 words, 2 pages Maximum-Density Segment @ EE.NTU
1962 Nobel Prize in Physiology or Medicine • Crick, Watson, and Wilkins Maximum-Density Segment @ EE.NTU
DNA’s picture • [Alexander Rich, 1973] • Structure biologist at MIT. • DNA’s picture in atomic resolution. Maximum-Density Segment @ EE.NTU
Celebrating 50 years of Double Helix (April 25, 1953 – 2003) Maximum-Density Segment @ EE.NTU
Francis Crick 1916-2004 • Passed away on July 28, 2004 taken in 1993 in Paris Maximum-Density Segment @ EE.NTU
Maurice Wilkins 1916-2004 • Passed away on Oct 5, 2004 Maximum-Density Segment @ EE.NTU
GC-content • Non-uniformity of nucleotide composition • 25% - 75% in genomes of all of organisms • 40% - 50% in typical mammalian genomes • 30% - 60% in human chromosomes • The underlying causes are still unknown. Maximum-Density Segment @ EE.NTU
GC content • GC-content is positively correlated with • gene length, • gene density, • patterns of coden usage, • recombination rate within chromosomes, • … Maximum-Density Segment @ EE.NTU
The Problem • Input: • an n-bit string S, • an integer L. • Output: • a substring S[i, j] of S with maximum density over all substrings of S with at least L bits. Maximum-Density Segment @ EE.NTU
Example • S = 1 0 0 1 1 0 0 1 0 1 1 0 1 0 0 1 • L = 1, 1 0 0 1 1 0 0 1 0 1 1 0 1 0 0 1 • L = 2, 1 0 0 1 1 0 0 1 0 1 1 0 1 0 0 1 • L = 3, 1 0 0 1 1 0 0 1 0 1 1 0 1 0 0 1 Maximum-Density Segment @ EE.NTU
density of each segment in O(1) time • prefix-sum(i) = S[1]+S[2]+…+S[i], • all n prefix sums are computable in O(n) time. • sum(i, j) = prefix-sum(j) – prefix-sum(i-1) • density(i, j) = sum(i, j) / (j-i+1) Maximum-Density Segment @ EE.NTU
Good partners • Finding the best ending position g(i) for each i=1,2,…,n. i + L g(i) L maximing avg[i, g(i)] Maximum-Density Segment @ EE.NTU
Previous Work • [Huang, CABIOS ’94] • O(nL) time. • Key observation: no need to examine substrings longer than 2L. g(i) i+L L L Maximum-Density Segment @ EE.NTU
Recent Progress • [Lin, Jiang, Chao, J. Computer Systems and Science (JCSS), 2002] • O(n log L) time. • Techniques: • Right-skew decomposition. • Jumping tables that allows binary search. g(i) i+L L L Maximum-Density Segment @ EE.NTU
Our results • Reducing the running time to O(n). Maximum-Density Segment @ EE.NTU
Reviewing Lin, Jiang, and Chao’s Algorithm Maximum-Density Segment @ EE.NTU
Right-Skew Substring • S[i, j] is right-skew if for each k = i,…, j-1 • density[i, k] ≤ density[k+1, j]. • S =1 0 0 1 1 0 0 1 0 1 1 0 1 0 0 1 • 1 0 0 1 1 0 0 1 0 1 1 0 1 0 0 1 • 1 0 0 1 1 0 0 1 0 1 1 0 1 0 0 1 • 1 0 0 1 1 0 0 1 0 1 1 0 1 0 0 1 Maximum-Density Segment @ EE.NTU
Right-Skew Decomposition • Partition S into substrings S1,S2,…,Sk such that • each Si is a right-skew substring of S • density(S1) > density(S2) > … > density(Sk) • [Lin, Jiang, Chao] • Unique • Computable in linear time. Maximum-Density Segment @ EE.NTU
1 1 1 0 1 1 0 1 0 1 1 0 0 An example 1 > 2/3 > 3/5 > 1/3 Maximum-Density Segment @ EE.NTU
Why RS-decomposition? • It suffices to search for g(i) among the boundaries of RS-decomposition of S[i, n]. • The boundaries’s “potential” of being a good partner is bi-tonic. • density[i, j1], density[i, j2], …, density[i, jk] is first monotonically increasing then monotonically decreasing. Maximum-Density Segment @ EE.NTU
Illustration g(i) i+L L i+L L Maximum-Density Segment @ EE.NTU
Preprocessing steps • RS-decomposition of S[i, n] for each i. • Jumping table that enables binary search among the boundaries. Maximum-Density Segment @ EE.NTU
i L First preprocessing:All RS-decompositions • The RS-decomposition of each S[i, n] • Linear time for each i = 1, …, n. • All n RS-decompositions • [Lin et al.] O(n2) time O(n) time. Maximum-Density Segment @ EE.NTU
1 1 1 0 1 1 0 1 0 1 1 0 0 Key: nested structures Maximum-Density Segment @ EE.NTU