CS 262 Discussion Section 1

CS 262 Discussion Section 1

Purpose of discussion sections • To clarify difficulties/ambiguities in the problem set questions and lecture material. • To supplement class material by going somewhat into the biological concepts and motivations underlying this field. • To discuss more algorithms from a topic, wherever needed.

Antiparallel vs Parallel strands

The DNA strand has a chemical polarity

The members of each base pair can fit together within the double helix only if the two strands of the helix are antiparallel

Prokaryotes do not have a nucleus, eukaryotes do

Eukaryotic DNA is packaged into chromosomes • A chromosome is a single, enormously long, linear DNA molecule associated with proteins that fold and pack the fine thread of DNA into a more compact structure. • Human Genome: 3.2 x 109 base pairs distributed over 46 chromosomes.

A display of the full set of 46 chromosomes

Sequence similarity

Biological motivation • Sequence similarity is useful in hypothesizing the function of a new sequence… • … assuming that sequence similarity implies structural and functional similarity. Sequence Database Query Response List of similar matches New Sequence

Case Study: Multiple Sclerosis • Multiple sclerosis is an autoimmune dysfunction in which the T-cells of the immune system start attacking the body’s own nerve cells. • The T-cells recognize the myelin sheath protein of neurons as foreign. • Show movie

Why does this happen? • A hypothesis: • Possibly, the myelin sheath proteins identified by the T-cells were similar to bacterial/viral sheath proteins from an earlier infection. • How to test this hypothesis? • Use sequence alignment. Identification of cause of immune dysfunction Lab tests Sequence Database Query Response List of similar bacterial/viral sequences. Myelin sheath proteins

Dynamic Programming • It is a way of solving problems (involving recurrence relations) by storing partial results. • Consider the Fibonacci Series: • F(n) = F(n-1) + F(n-2) • F(0) = 0, F(1) = 1 • A recursive algorithm will take exponential time to find F(n) • A Dynamic Prog. based solution takes only n steps (linear time)

Needleman-Wunsch algorithm • F(i,j) = Maximum of • F(i-1, j-1) + s(x[i], y[j]) • F(i-1, j) – d • F(i, j-1) - d +s(X[i],Y[j]) -d Assume that match = 1, mismatch = 0, indel = 0 -d

Needleman-Wunsch example G A A T T C A G T T A G G A T C G A

Traceback G A A T T C A G T T A G G A T C G A

The solution • Optimal alignment has a score of 6.

Linear Space Alignment • Serafim talked about the Myers-Miller algorithm in class. • There is another variant of the Hirschberg algorithm, given in Durbin (Pg 35).

Suppose we know that characters X[i] and Y[j] are aligned to each other in the optimal alignment of X[1..n] and Y[1..m]. • How can we compute the alignment using this information? • We can partition the alignment into two parts, align X[1..i-1] with Y[1..j-1] and X[i+1..n] with Y[j+1..m]separately.

Middle column

Middle column This is the cell in the middle column from where the traceback leaves the column. Maintain the coordinates of that cell with the value of F(i,j) Call it c(i,j)

For every cell in the right half of the matrix, • Maintain the F(i,j) value. • Maintain the coordinates of the cell in the middle column from where its traceback path leaves the middle column. Call it c(i, j). • Maintain the direction of that jump as given by the pointer (either or ). Call it P(i,j).

If (i’,j’) is the cell preceding to (i,j), from which F(i,j) is derived, then • c(i,j) = c(i’,j’) and P(i,j) = P(i’,j’) • We need only linear space to compute the F,c and P values as we proceed across the matrix.

Middle column We know the traceback from (i’,j’) leaves the middle column at this cell Hence, the traceback from this cell will also have the same c(i,j) value We are interested in the value of c(n.m)

CS 262 Discussion Section 1

CS 262 Discussion Section 1

Presentation Transcript

CS 155 Section 1 PP1

CS 262-557

Psych 1 Discussion Section

CS 262 Software Engineering

CS 162 Discussion Section Week 9

CS 162 Discussion Section Week 2

Discussion Section Week 1

CS 162 Discussion Section Week 3

CS 162 Discussion Section Week 2

CS 241 Discussion Section (2/9/2012)

CS 162 Discussion Section Week 1 (9/9 – 9/13)

CS 262 Problem Session

COP 3530 Discussion Section 1

CS 241 Section Week #1

CS 162 Discussion Section Week 4

CS 241 Discussion Section (11/11/--11)

CS 162 Discussion Section Week 5

CS 241 Section Week #1

CS 262: Programming Languages

CS 377 Discussion 1

CS 162 Discussion Section Week 6

CS 377 Discussion 1