250 likes | 338 Vues
The Simplified Partial Digest Problem: Hardness and a Probabilistic Analysis. Zo ë Abrams zoea@stanford.edu Ho-Lin Chen holin@stanford.edu. Restriction Site Analysis.
E N D
The Simplified Partial Digest Problem:Hardness and a Probabilistic Analysis Zoë Abrams zoea@stanford.edu Ho-Lin Chen holin@stanford.edu
Restriction Site Analysis • An enzyme cuts a target DNA strand to into DNA fragments, and these DNA fragments are used to reconstruct the restriction site locations of the enzyme. • Two common Approaches • Double Digest Problem (NP-complete) [Goldstein, Waterman ’87] • Partial Digest Problem
Partial Digest Problem • Reconstruct the locations using the length of all fragments that can possibly be produced. • The hardness of the problem is unknown. [Skiena, Sundaram ’93][Lemke, Skiena, Smith ’02] • Adding the primary fragments to the information used, we can find a unique reconstruction in polynomial time. [Pandurangan, Ramesh ’01] • Information is susceptible to experimental error caused by missing fragments.
Simplified Partial Digest Problem • Proposed by Blazewicz et. Al. ’01 • Uses primary fragments and base fragments to reconstruct restriction sites • Primary fragments: One of the endpoints is the endpoint of the original DNA strand • Base fragments: two endpoints are consecutive sites on the DNA strand
Problem Definition • Given • X0 = 0, Xn+1 = D • A set of base fragments {Xi - Xi-1}1 i n+1 • A set of primary fragments {(Xn+1 - Xi) (Xi – X0)}1 i n • Reconstruct the original series X1,...,Xn,
Theoretical and Algorithmic Issues • The algorithm that finds the exact solution may take 2n time in the worst case. [Blazewicz, Jaroszewski ’03] • The Simplified Partial Digest Problem may have exponential number of solutions. • The problem is APX-hard. • Simple algorithms can give correct solution with high probability.
Proof of APX-Hardness • We proved Simplified Partial Digest Problem is APX-hard by reducing the Tripartite-Matching problem to it. • Tripartite-Matching Problem: Given a set S of triples in {1,2,3..n}3 , |S|=T. Find whether there exists a subset M of S such that |M| = n, and no two triples in M are the same in some coordinates.
Proof of APX-Hardness • Use symmetric restriction sites to cut the segment into 2T equal-length segments ……. 2T 1 2
Proof of APX-Hardness • Use symmetric restriction sites to cut the segment into 2T equal-length segments Pairs of symmetric restriction sites …….
Proof of APX-Hardness • Use symmetric restriction sites to cut the segment into 2T equal-length segments Pairs of symmetric restriction sites …….
Proof of APX-Hardness • Use symmetric restriction sites to cut the segment into 2T equal-length segments Pairs of symmetric restriction sites …….
Proof of APX-Hardness • Use symmetric restriction sites to cut the segment into 2T equal-length segments. • In each pair of equal-length segments, there are seven restriction sites that can be put on either side. ……. 2T 1 2 Sites “x" can be on either side
Proof of APX-Hardness • Use symmetric restriction sites to cut the segment into 2T equal-length segments. • In each pair of equal-length segments, there are seven restriction sites that can be put on either side. ……. 2T 1 2 Sites “x" can be on either side
Proof of APX-Hardness • Those seven restriction sites can be divided into two groups, denoted by “o” and “x” respectively.
Proof of APX-Hardness • Those seven restriction sites can be divided into two groups, denoted by “o” and “x” respectively. • In each segment, restriction sites in the same group must be put on the same side.
Proof of APX-Hardness • Those seven restriction sites can be divided into two groups, denoted by “o” and “x” respectively. • In each segment, restriction sites in the same group must be put on the same side. • Each placement of restriction sites corresponds to a set of triples chosen in the Tripartite Matching Problem. not chosen chosen
Proof of APX-Hardness • Those seven restriction sites can be divided into two groups, denoted by “o” and “x” respectively. • In each segment, restriction sites in the same group must be put on the same side. • Each placement of restriction sites corresponds to a set of triples chosen in the Tripartite Matching Problem. • The current placement of restriction sites is a solution iff the corresponding set of triples is a solution to the Tripartite Matching Problem.
A Simple Algorithm • Put all symmetric points at correct locations • Put all asymmetric points on the left side
A Simple Algorithm • Put all symmetric points at correct locations • Put all asymmetric points on the left side • From each site, do (from endpoints to the middle) • If the base segment is matched, fix its location
A Simple Algorithm • Put all symmetric points at correct locations • Put all asymmetric points on the left side • From each site, do (from endpoints to the middle) • If the base segment is matched, fix its location • If the base segment isn’t matched, move it and all points toward middle to the other side.
A Simple Algorithm • Put all symmetric points at correct locations • Put all asymmetric points on the left side • From each site, do (from endpoints to the middle) • If the base segment is matched, fix its location • If the base segment isn’t matched, move it and all points toward middle to the other side.
Analysis of the Algorithm • Assuming a uniform distribution for restriction sites, for many practical parameters*, with probability at least 0.4 the algorithm outputs correct locations. • All the primary fragments are matched, and at least ¼ of all base fragments will be matched in the worst case. • Runs in time linear to the number of sites *Ex: Length of the DNA strand around 20,000, 10-20 restriction sites
Future Work • Construct better heuristics to solve SPDP • Analyze the hardness of Partial Digest Problem • Find other characterizations of restriction sites that are both easy to measure and can be used to reconstruct the sites