320 likes | 332 Vues
A Note on Useful Algorithmic Strategies. Kun-Mao Chao ( 趙坤茂 ) Department of Computer Science and Information Engineering National Taiwan University, Taiwan WWW: http://www.csie.ntu.edu.tw/~kmchao. Greedy Algorithm. A greedy method always makes a locally optimal (greedy) choice.
E N D
A Note on Useful Algorithmic Strategies Kun-Mao Chao (趙坤茂) Department of Computer Science and Information Engineering National Taiwan University, Taiwan WWW: http://www.csie.ntu.edu.tw/~kmchao
Greedy Algorithm • A greedy method always makes a locally optimal (greedy) choice. • the greedy-choice property: a globally optimal solution can be reached by a greedy choice. • optimal substructures
Huffman Codes (1952) David Huffman (August 9, 1925 – October 7, 1999)
Huffman Codes Expected number of bits per character = 3x0.1+3x0.1+2x0.3+1x0.5 = 1.7 (vs. 2 bits by a simple scheme)
An example Sequence: GTTGTTATCGTTTATGTGGC By Huffman Coding: 0111011100010010111100010110101001 20 characters; 34 bits in total
Divide-and-Conquer • Divide the problem into smaller subproblems. • Conquer each subproblem recursively. • Combine the solutions to the child subproblems into the solution for the parent problem.
Merge Sort(Invented in 1938; Coded in 1945) John von Neumann (December 28, 1903 – February 8, 1957 )
Dynamic Programming • Dynamic programming is a class of solution methods for solving sequential decision problems with a compositional cost structure. • Richard Bellman was one of the principal founders of this approach. Richard Ernest Bellman(1920–1984)
Two key ingredients • Two key ingredients for an optimization problem to be suitable for a dynamic-programming solution: 2. overlapping subproblems 1. optimal substructures Subproblems are dependent. (otherwise, a divide-and-conquer approach is the choice.) Each substructure is optimal. (Principle of optimality)
Three basic components • The development of a dynamic-programming algorithm has three basic components: • The recurrence relation (for defining the value of an optimal solution); • The tabular computation (for computing the value of an optimal solution); • The traceback (for delivering an optimal solution).
= F 0 0 = F 1 1 = + F F F for i>1 . - - i i 1 i 2 Fibonacci numbers The Fibonacci numbers are defined by the following recurrence: Leonardo of Pisa(c. 1170 – c. 1250) 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765, 10946, 17711, 28657, 46368, 75025, 121393, ...
F8 F9 F7 F10 F7 F8 F6 How to computeF10? ……
Tabular computation • The tabular computation can avoid recompuation.
Longest increasing subsequence(LIS) • A subsequence of a sequence S is obtained by deleting zero or more numbers from S. The longest increasing subsequence problem is to find a longest increasing subsequence of a given sequence of distinct integers a1a2…an. • e.g. 9 2 5 3 7 11 8 10 13 6 • 3 7 • 7 10 13 • 7 11 • 3 5 11 13 are increasing subsequences. We want to find a longest one. are not increasing subsequences.
A naive approach for LIS • Let L[i] be the length of a longest increasing subsequence ending at position i. L[i] = 1 + max j = 0..i-1{L[j] | aj < ai}(use a dummy a0 = minimum, and L[0]=0) 9 2 5 3 7 11 8 10 13 6 L[i] 1 1 2 2 3 4 ?
A naive approach for LIS L[i] = 1 + max j = 0..i-1{L[j] | aj < ai} 9 2 5 3 7 11 8 10 13 6 L[i] 1 1 2 2 3 4 4 5 6 3 The maximum length The subsequence 2, 3, 7, 8, 10, 13 is a longest increasing subsequence. This method runs in O(n2) time.
An O(n log n) method for LIS • Define BestEnd[k] to be the smallest number of an increasing subsequence of length k. 9 2 5 3 7 11 8 10 13 6 9 2 2 2 2 2 2 2 2 BestEnd[1] 5 3 3 3 3 3 3 BestEnd[2] 7 7 7 7 7 BestEnd[3] When processing ai, let BestEnd[k’]be the smallest element that is larger than ai. BestEnd[k’] ai 11 8 8 8 BestEnd[4] 10 10 BestEnd[5] 13 BestEnd[6]
An O(n log n) method for LIS • Define BestEnd[k] to be the smallest number of an increasing subsequence of length k. 9 2 5 3 7 11 8 10 13 6 9 2 2 2 2 2 2 2 2 2 BestEnd[1] 5 3 3 3 3 3 3 3 BestEnd[2] 7 7 7 7 7 6 BestEnd[3] 11 8 8 8 8 BestEnd[4] 10 10 10 BestEnd[5] For each position, we perform a binary search to update BestEnd. Therefore, the running time is O(n log n). 13 13 BestEnd[6] The subsequence 2, 3, 7, 8, 10, 13 is a longest increasing subsequence.
Binary search • Given an ordered sequence x1x2 ... xn, where x1<x2< ... <xn, and a number y, a binary search finds the largest xi such that xi< y in O(log n) time. n/2 ... n/4 n
Binary search • How many steps would a binary search reduce the problem size to 1?n n/2 n/4 n/8 n/16 ... 1 How many steps? O(log n) steps.
All longest increasing subsequences • The number of all longest increasing subsequences could be exponential. 1 5 4 3 2 6 10 9 8 7 11 15 14 13 12 16 20 19 18 17 21 1 5 4 3 2 6 10 9 8 7 11 15 14 13 12 16 20 19 18 17 21 There are 4 x 4 x 4 x 4 = 256 longest increasing subsequences.
Longest Common Subsequence (LCS) • A subsequence of a sequence S is obtained by deleting zero or more symbols from S. For example, the following are all subsequences of “president”: pred, sdn, predent. • The longest common subsequence problem is to find a maximum-length common subsequence between two sequences.
LCS For instance, Sequence 1: president Sequence 2: providence Its LCS is priden. president providence
LCS Another example: Sequence 1: algorithm Sequence 2: alignment One of its LCS is algm. a l g o r i t h m a l i g n m e n t
How to compute LCS? • Let A=a1a2…am and B=b1b2…bn . • len(i, j): the length of an LCS between a1a2…ai and b1b2…bj • With proper initializations, len(i, j)can be computed as follows.
Longest Common Increasing Subsequence • Proposed by Yang, Huang and Chao • IPL 2005 • 2 5 3 7 11 8 10 13 6 6 5 2 8 3 7 4 10 1 13