1 / 51

CS 262 Discussion Section 1

CS 262 Discussion Section 1. Purpose of discussion sections. To clarify difficulties/ambiguities in the problem set questions and lecture material . To supplement class material by going somewhat into the biological concepts and motivations underlying this field .

Télécharger la présentation

CS 262 Discussion Section 1

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 262 Discussion Section 1

  2. Purpose of discussion sections • To clarify difficulties/ambiguities in the problem set questions and lecture material. • To supplement class material by going somewhat into the biological concepts and motivations underlying this field. • To discuss more algorithms from a topic, wherever needed.

  3. Antiparallel vs Parallel strands

  4. The DNA strand has a chemical polarity

  5. The members of each base pair can fit together within the double helix only if the two strands of the helix are antiparallel

  6. Prokaryotes do not have a nucleus, eukaryotes do

  7. Eukaryotic DNA is packaged into chromosomes • A chromosome is a single, enormously long, linear DNA molecule associated with proteins that fold and pack the fine thread of DNA into a more compact structure. • Human Genome: 3.2 x 109 base pairs distributed over 46 chromosomes.

  8. A display of the full set of 46 chromosomes

  9. Sequence similarity

  10. Biological motivation • Sequence similarity is useful in hypothesizing the function of a new sequence… • … assuming that sequence similarity implies structural and functional similarity. Sequence Database Query Response List of similar matches New Sequence

  11. Case Study: Multiple Sclerosis • Multiple sclerosis is an autoimmune dysfunction in which the T-cells of the immune system start attacking the body’s own nerve cells. • The T-cells recognize the myelin sheath protein of neurons as foreign. • Show movie

  12. Why does this happen? • A hypothesis: • Possibly, the myelin sheath proteins identified by the T-cells were similar to bacterial/viral sheath proteins from an earlier infection. • How to test this hypothesis? • Use sequence alignment. Identification of cause of immune dysfunction Lab tests Sequence Database Query Response List of similar bacterial/viral sequences. Myelin sheath proteins

  13. Dynamic Programming • It is a way of solving problems (involving recurrence relations) by storing partial results. • Consider the Fibonacci Series: • F(n) = F(n-1) + F(n-2) • F(0) = 0, F(1) = 1 • A recursive algorithm will take exponential time to find F(n) • A Dynamic Prog. based solution takes only n steps (linear time)

  14. Needleman-Wunsch algorithm • F(i,j) = Maximum of • F(i-1, j-1) + s(x[i], y[j]) • F(i-1, j) – d • F(i, j-1) - d +s(X[i],Y[j]) -d Assume that match = 1, mismatch = 0, indel = 0 -d

  15. Needleman-Wunsch example G A A T T C A G T T A G G A T C G A

  16. Needleman-Wunsch example G A A T T C A G T T A G G A T C G A

  17. Needleman-Wunsch example G A A T T C A G T T A G G A T C G A

  18. Needleman-Wunsch example G A A T T C A G T T A G G A T C G A

  19. Traceback G A A T T C A G T T A G G A T C G A

  20. Traceback G A A T T C A G T T A G G A T C G A

  21. Traceback G A A T T C A G T T A G G A T C G A

  22. Traceback G A A T T C A G T T A G G A T C G A

  23. Traceback G A A T T C A G T T A G G A T C G A

  24. Traceback G A A T T C A G T T A G G A T C G A

  25. Traceback G A A T T C A G T T A G G A T C G A

  26. Traceback G A A T T C A G T T A G G A T C G A

  27. Traceback G A A T T C A G T T A G G A T C G A

  28. Traceback G A A T T C A G T T A G G A T C G A

  29. Traceback G A A T T C A G T T A G G A T C G A

  30. Traceback G A A T T C A G T T A G G A T C G A

  31. Traceback G A A T T C A G T T A G G A T C G A

  32. The solution • Optimal alignment has a score of 6.

  33. Linear Space Alignment • Serafim talked about the Myers-Miller algorithm in class. • There is another variant of the Hirschberg algorithm, given in Durbin (Pg 35).

  34. Suppose we know that characters X[i] and Y[j] are aligned to each other in the optimal alignment of X[1..n] and Y[1..m]. • How can we compute the alignment using this information? • We can partition the alignment into two parts, align X[1..i-1] with Y[1..j-1] and X[i+1..n] with Y[j+1..m]separately.

  35. Middle column

  36. Middle column

  37. Middle column

  38. Middle column

  39. Middle column

  40. Middle column

  41. Middle column This is the cell in the middle column from where the traceback leaves the column. Maintain the coordinates of that cell with the value of F(i,j) Call it c(i,j)

  42. For every cell in the right half of the matrix, • Maintain the F(i,j) value. • Maintain the coordinates of the cell in the middle column from where its traceback path leaves the middle column. Call it c(i, j). • Maintain the direction of that jump as given by the pointer (either or ). Call it P(i,j).

  43. If (i’,j’) is the cell preceding to (i,j), from which F(i,j) is derived, then • c(i,j) = c(i’,j’) and P(i,j) = P(i’,j’) • We need only linear space to compute the F,c and P values as we proceed across the matrix.

  44. Middle column We know the traceback from (i’,j’) leaves the middle column at this cell Hence, the traceback from this cell will also have the same c(i,j) value We are interested in the value of c(n.m)

More Related