Xiaoming Sun Tsinghua University David Woodruff MIT

The Communication and Streaming Complexity of Computing the Longest Common and Increasing Subsequences Xiaoming Sun Tsinghua University David Woodruff MIT

The Problem • Stream of elements a1, …, an2 • Algorithm given one pass over stream • Problem: Compute the longest increasing subsequence (LIS) – in this case answer is (3,7) 4 3 7 3 1 1 0

Previous Work • Let k be the length of the LIS of the stream • There exists an algorithm which computes the LIS with O(k2 log ||) space [LNVZ05] • Trivial (k) lower bound • Our first result: Improve both bounds to a tight (k2 log ||/k)

Our Lower Bound Reduction from indexing function: Alice Bob What is xi? x 2 {0,1}n i 2 [n] = {1, 2, …, n} Randomized 1-way communication is (n)

Alice Bob What is xi? x 2 {0,1}n i 2 [n] = {1, 2, …, n} Construct a stream A Construct a stream B • From LIS(A, B), Bob can get xi • 2. |LIS(A, B)| = k, where k is input parameter

Alice Ak-1 Value … x 2 {0,1}n A: A2 A1 Position in stream Alice uses x to create k-1 increasing sequences A1, …, Ak-1For each j, Aj has length j. Each bit of x is encoded in some sequence Aj Every element in Ak-1 is larger than every element in Ak-2, every element in Ak-2 larger than every element in Ak-3, etc. Set A = Ak-1 ,…, A2 , A1

Aj+1 Value B Aj B: Aj-1 Position in stream Bob i 2 [n] Bob uses i to recover Aj, the sequence encoding xi Bob creates an increasing subsequence B of length k-j, Every element in B is greater than Arif r < j, and every element in B is less than Arif r > j

Alice Bob What is xi? Aj+1 i 2 [n] x 2 {0,1}n Value B Aj Aj-1 B A = Ak-1, …, A2, A1 Position in stream LIS(A, B) = Aj, B, and |LIS(A, B)| = k But xi encoded in Aj, so Bob recovers xi

Thus, any streaming algorithm must use (n) space. • But what is n? We need to construct k increasing sequences that are different for different x in {0,1}n • Assume || large. Divide  into k-1 blocks of size ||/(k-1) • Let Aj be a random increasing sequence of length j in block j. • The space to represent Aj is (k log ||/k) for j > k/2 • Set n = (k2 log ||/k).

Our Upper Bound • When processing the stream, keep lists A[1], A[2], …, A[k]. • A[j] is an LIS of length j in the stream with minimal last element. • Let L[1], L[2], …, L[k] be last elements of A[1], A[2], …, A[k] • To process item x,find i for which L[i] < x < L[i+1], and replace A[i+1] with A[i], x

So we have k arrays A[1], …, A[k], each of length at most k. • Naively, this takes O(k2 log ||) space. • But the Ai are increasing, so can compress the list by storing differences. • Total space is O(k2 log ||/k).

This talk • First result: a tight space bound for the LIS problem • Second result: tight bounds for longest common subsequence (LCS)

LCS Bounds • Problem: Alice has a permutation  of [N], Bob has a permutation  of [N]. Decide if |LCS(, )| ¸ k. • Previous space bound: (k) [LNVZ05] • Our space bound: (N) for 3 · k · N/2 (holds for randomized O(1)-pass algorithms)

LCS Bounds • Why can we only prove (N) for 3 · k · N/2? • If k = 2, reduces to equality test. • If k large, there are at most O(N2(N-k)) permutations  with |LCS(, )| > k, so just use an equality test with error O(1/N2(N-k))

Our Lower Bound • Padding lemma: if for k = 3 the randomized communication complexity is (N), then it’s (N) for all k · N/2 • Proof: just pad each of the inputs by some common subsequence of length k-3

Remains to show high complexity for k =3. We reduce from disjointness Is there an i such that xi = yi = 1? Alice Bob x 2 {0,1}n y 2 {0,1}n Randomized multi-way communication is (n)

Is there an i such that xi = yi = 1? Alice Bob y 2 {0,1}N/3 x 2 {0,1}N/3 Construct  Construct  Want |LCS(, )| ¸ 3 iff x and y are disjoint

Alice  = 1, 2, …, N/3 x 2 {0,1}N/3 Divide 1, …, N into N/3 groups G1 = (1, 2, 3), G2 = (4, 5, 6), …, GN/3 = (N-2, N-1, N). Use x to choose 1, …, N/3 iacts on Gi If xi = 0, i (m+1, m+2, m+3) = (m+1, m+2, m+3). If xi = 1, i (m+1, m+2, m+3) = (m+1, m+3, m+2).

Bob y 2 {0,1}N/3  = N/3 , …, 1 Divide 1, …, N into N/3 groups G1 = (1, 2, 3), G2 = (4, 5, 6), …, GN/3 = (N-2, N-1, N). Use y to choose 1, …, N/3 iacts on Gi If yi = 0, i (m+1, m+2, m+3) = (m+3, m+2, m+1). If yi = 1, I (m+1, m+2, m+3) = (m+1, m+3, m+2).

N/3(GN/3) N/3(GN/3) … … 3(G3) 3(G3) 2(G2) 2(G2) 1(G1) 1(G1) Claim: |LCS(, )| · 3. Proof: Use the fact that LCS(, ) intersects at most one Gi Claim: |LCS(, )| = 3 iff there is some i with xi = yi = 1 Proof: Use the way we defined i and i Thus, can decide disjointness, so (N) communication.

Other results • Tight space bounds for computing the LIS length. • Generalization to approximate LIS and LCS. Still many gaps here. • Example: approximate LIS length, we have (1/) and O(k log ||). Recent work [GJKK07] has shown O(sqrt(N/) log ||), but still large gap.

Conclusion • First result: a tight bound for the LIS • Second result: an (N) space bound for the LCS k-decision problem for 3 · k · N/2 • Other results for approximation problems • Another open question: extend our lower bound for LIS to randomized multi-round

Xiaoming Sun Tsinghua University David Woodruff MIT

Xiaoming Sun Tsinghua University David Woodruff MIT

Presentation Transcript

Seth Roberts Tsinghua University, UC Berkeley

The GEM activities at Tsinghua University

Zheng -Yu Weng IAS, Tsinghua University

Tsinghua University Supernova Program

Tsinghua University-Kyoto University International Symposium

Entering into Tsinghua University Library

MRPC manufacture status at Tsinghua University

Kai Zhou (Tsinghua University,Beijing)

Zhao Xiusheng INET, Tsinghua University Beijing 100084, China zhaoxs@tsinghua

Legacy Code Wrapping Yongwei WU Tsinghua University

Tsinghua University Science Park ● Beijing ● P.R. China

Hong-Jian He Tsinghua University

Ron Woodruff

NetFPGA Tutorial Tsinghua University – Day 2

Accelerator Activities at Tsinghua University

Internet Test-beds in Tsinghua University

Tsinghua University Prof. Jianmin Li

Entering into Tsinghua University Library