1 / 72

An optimal algorithm for identifying a maximum-density segment

An optimal algorithm for identifying a maximum-density segment. 呂學一 ( 中央研究院 資訊科學所 ) http://www.iis.sinica.edu.tw/~hil/. Microsoft Office XP is needed to see all the animation effects. What do algorithm people do?. Inventing efficient recipes to solve combinatorial problems.

costanzas
Télécharger la présentation

An optimal algorithm for identifying a maximum-density segment

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An optimal algorithm for identifying a maximum-density segment 呂學一 (中央研究院 資訊科學所) http://www.iis.sinica.edu.tw/~hil/ Microsoft Office XP is needed to see all the animation effects. Maximum-Density Segment @ EE.NTU

  2. What do algorithm people do? Inventing efficient recipes to solve combinatorial problems Maximum-Density Segment @ EE.NTU

  3. A famous combinatorial problem • The Factorization Problem • Input: a number N • Output: • “yes” if N is a prime number; • A factorization of N if N is not a prime number. • For example, • N = 323264989793317. • Output = 18672511 * 17312347. Maximum-Density Segment @ EE.NTU

  4. OPEN QUESTION Is there an efficient recipe for the Factorization Problem? Maximum-Density Segment @ EE.NTU

  5. Why Factorization? The security of many encryption schemes is based upon the assumption that the factorization problem is difficult. Maximum-Density Segment @ EE.NTU

  6. RSA encryption –– 1978 Rivest Shamir Adleman Maximum-Density Segment @ EE.NTU

  7. RSA factorization challenges Maximum-Density Segment @ EE.NTU

  8. US$10,000 –– RSA-576 • 1881988129206079638386972394616504398071635633794138270076335642298885971523466548531906060650474304531738801130339671619969232120573403187955065699621305168759307650257059 Maximum-Density Segment @ EE.NTU

  9. RSA-576 factored in December 3, 2003 • 398075086424064937397125500550386491199064362342526708406385189575946388957261768583317 • 472772146107435302536223071973048224632914695302097116459852171130520711256363590397527 • At the same time, Adi Shamir gave two talks at NTU (Dec. 4, 2003 ) Maximum-Density Segment @ EE.NTU

  10. US$20,000 –– RSA-640 • 3107418240490043721350750035888567930037346022842727545720161948823206440518081504556346829671723286782437916272838033415471073108501919548529007337724822783525742386454014691736602477652346609 Maximum-Density Segment @ EE.NTU

  11. US$200,000 –– RSA-2048 • 25195908475657893494027183240048398571429282126204032027777137836043662020707595556264018525880784406918290641249515082189298559149176184502808489120072844992687392807287776735971418347270261896375014971824691165077613379859095700097330459748808428401797429100642458691817195118746121515172654632282216869987549182422433637259085141865462043576798423387184774447920739934236584823824281198163815010674810451660377306056201619676256133844143603833904414952634432190114657544454178424020924616515723350778707749817125772467962926386356373289912154831438167899885040445364023527381951378636564391212010397122822120720357 Maximum-Density Segment @ EE.NTU

  12. Short of cash? www.rsasecurity.com/rsalabs/challenges/factoring/

  13. RSA 2003 (April ’03) Maximum-Density Segment @ EE.NTU

  14. 2002 Turing Award (June’03) Maximum-Density Segment @ EE.NTU

  15. The awarded paper • Only 7 pages. • “A Method for Obtaining Digital Signatures and Public Key Cryptosystems”, Communications of the ACM21, 120-126, 1978. Maximum-Density Segment @ EE.NTU

  16. “PRIMES is in P”Agarwal, Kayal, and Saxena August 6, 2002 Maximum-Density Segment @ EE.NTU

  17. PRIMES is in P • The PRIMES problem: • Input: a number N. • Output: • “yes” if N is a prime number. • “no” if N is not a prime number. • Only 9 pages! • Running time is O(n12), where n is the number of digits. Maximum-Density Segment @ EE.NTU

  18. NEW YORK TIMES, Aug. 8, 2002 • Previous algorithmic results that caught the attention of the New York Times • 1984, Karmarkar’s algorithm for solving linear programs. • 1979, Khachian’s algorithm for solving linear programs. Maximum-Density Segment @ EE.NTU

  19. The latest version (v.3) of AKS’s paper • The running time is now improved from O(n12) to O(n7.5). Maximum-Density Segment @ EE.NTU

  20. What do algorithm people do? • Looking for important/interesting combinatorial problems • Coming up with efficient recipes to solve them exactly or approximately. Maximum-Density Segment @ EE.NTU

  21. Bioinformatics • A gold mine of combinatorial problems Maximum-Density Segment @ EE.NTU

  22. An example: My results Maximum-Density Segment @ EE.NTU

  23. Finding a DNA segment with Max GC-density in linear time WABI  J. Comput. Sys. Sci. ESA  SIAM J. Computing Maximum-Density Segment @ EE.NTU

  24. DNA Sequences • [Chargaff and Vischer, 1949] • DNA consisting of A, G, T, C • Adenine (腺嘌呤) • Guanine (鳥糞嘌呤) • Cytosine (胞嘧啶) • Thymine (胸腺嘧啶) Maximum-Density Segment @ EE.NTU

  25. [Vischer, Zamenhof, Chargaff, 1949] • Negative evidences for the widely believed %A = %G = %T = %C. Maximum-Density Segment @ EE.NTU

  26. Edwin Chargaff, 1905- • Observing • %A ~ %T • %G ~ %C • “A comparison of the molar proportions reveals certain striking, but perhaps meaningless, regularities” Maximum-Density Segment @ EE.NTU

  27. Double Helix • [Watson and Crick, Nature, April 25, 1953] • Biologist (age 23, fresh Ph.D.) + Physicist (age 35, still a Ph.D. student) • 900 words, 2 pages Maximum-Density Segment @ EE.NTU

  28. 1962 Nobel Prize in Physiology or Medicine • Crick, Watson, and Wilkins Maximum-Density Segment @ EE.NTU

  29. DNA’s picture • [Alexander Rich, 1973] • Structure biologist at MIT. • DNA’s picture in atomic resolution. Maximum-Density Segment @ EE.NTU

  30. Celebrating 50 years of Double Helix (April 25, 1953 – 2003) Maximum-Density Segment @ EE.NTU

  31. Francis Crick 1916-2004 • Passed away on July 28, 2004 taken in 1993 in Paris Maximum-Density Segment @ EE.NTU

  32. Maurice Wilkins 1916-2004 • Passed away on Oct 5, 2004 Maximum-Density Segment @ EE.NTU

  33. GC-content • Non-uniformity of nucleotide composition • 25% - 75% in genomes of all of organisms • 40% - 50% in typical mammalian genomes • 30% - 60% in human chromosomes • The underlying causes are still unknown. Maximum-Density Segment @ EE.NTU

  34. GC content • GC-content is positively correlated with • gene length, • gene density, • patterns of coden usage, • recombination rate within chromosomes, • … Maximum-Density Segment @ EE.NTU

  35. The Problem • Input: • an n-bit string S, • an integer L. • Output: • a substring S[i, j] of S with maximum density over all substrings of S with at least L bits. Maximum-Density Segment @ EE.NTU

  36. Example • S = 1 0 0 1 1 0 0 1 0 1 1 0 1 0 0 1 • L = 1, 1 0 0 1 1 0 0 1 0 1 1 0 1 0 0 1 • L = 2, 1 0 0 1 1 0 0 1 0 1 1 0 1 0 0 1 • L = 3, 1 0 0 1 1 0 0 1 0 1 1 0 1 0 0 1 Maximum-Density Segment @ EE.NTU

  37. density of each segment in O(1) time • prefix-sum(i) = S[1]+S[2]+…+S[i], • all n prefix sums are computable in O(n) time. • sum(i, j) = prefix-sum(j) – prefix-sum(i-1) • density(i, j) = sum(i, j) / (j-i+1) Maximum-Density Segment @ EE.NTU

  38. Good partners • Finding the best ending position g(i) for each i=1,2,…,n. i + L g(i) L maximing avg[i, g(i)] Maximum-Density Segment @ EE.NTU

  39. Previous Work • [Huang, CABIOS ’94] • O(nL) time. • Key observation: no need to examine substrings longer than 2L. g(i) i+L L L Maximum-Density Segment @ EE.NTU

  40. Recent Progress • [Lin, Jiang, Chao, J. Computer Systems and Science (JCSS), 2002] • O(n log L) time. • Techniques: • Right-skew decomposition. • Jumping tables that allows binary search. g(i) i+L L L Maximum-Density Segment @ EE.NTU

  41. Our results • Reducing the running time to O(n). Maximum-Density Segment @ EE.NTU

  42. Reviewing Lin, Jiang, and Chao’s Algorithm Maximum-Density Segment @ EE.NTU

  43. Right-Skew Substring • S[i, j] is right-skew if for each k = i,…, j-1 • density[i, k] ≤ density[k+1, j]. • S =1 0 0 1 1 0 0 1 0 1 1 0 1 0 0 1 • 1 0 0 1 1 0 0 1 0 1 1 0 1 0 0 1 • 1 0 0 1 1 0 0 1 0 1 1 0 1 0 0 1 • 1 0 0 1 1 0 0 1 0 1 1 0 1 0 0 1 Maximum-Density Segment @ EE.NTU

  44. Right-Skew Decomposition • Partition S into substrings S1,S2,…,Sk such that • each Si is a right-skew substring of S • density(S1) > density(S2) > … > density(Sk) • [Lin, Jiang, Chao] • Unique • Computable in linear time. Maximum-Density Segment @ EE.NTU

  45. 1 1 1 0 1 1 0 1 0 1 1 0 0 An example 1 > 2/3 > 3/5 > 1/3 Maximum-Density Segment @ EE.NTU

  46. Why RS-decomposition? • It suffices to search for g(i) among the boundaries of RS-decomposition of S[i, n]. • The boundaries’s “potential” of being a good partner is bi-tonic. • density[i, j1], density[i, j2], …, density[i, jk] is first monotonically increasing then monotonically decreasing. Maximum-Density Segment @ EE.NTU

  47. Illustration g(i) i+L L i+L L Maximum-Density Segment @ EE.NTU

  48. Preprocessing steps • RS-decomposition of S[i, n] for each i. • Jumping table that enables binary search among the boundaries. Maximum-Density Segment @ EE.NTU

  49. i L First preprocessing:All RS-decompositions • The RS-decomposition of each S[i, n] • Linear time for each i = 1, …, n. • All n RS-decompositions • [Lin et al.] O(n2) time  O(n) time. Maximum-Density Segment @ EE.NTU

  50. 1 1 1 0 1 1 0 1 0 1 1 0 0 Key: nested structures Maximum-Density Segment @ EE.NTU

More Related