1 / 16

Regular Meeting December 22, 2008

Regular Meeting December 22, 2008. Mark Borodovsky Ivan Antonov. Topics. What have been done FSMark HMM implementation Answers to the previous meeting questions Future work. What have been done. HMM implementation in FSMark has been changed

Télécharger la présentation

Regular Meeting December 22, 2008

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Regular MeetingDecember 22, 2008 Mark Borodovsky Ivan Antonov

  2. Topics • What have been done • FSMark HMM implementation • Answers to the previous meeting questions • Future work GATech

  3. What have been done • HMM implementation in FSMark has been changed • Some questions from the previous meeting have been answered GATech

  4. FSMark HMM implementation

  5. Current HMM implementation • Currently for a given position i we look backward on 2 nucleotides instead of looking forward • FSMark starts examining sequence from the 3rd position only (i=2), so we have complete emission string (there are strange results if we start with 1st position) • Since FSMark starts with i=2 gene without frame shift will have state 2 GATech

  6. FSMark prediction depends on FS letter • A test has been done for a sample gene inserting different letters in the middle of the gene. FSMark-GM hmm_def file was used. GATech

  7. Answers to the previous meeting questions

  8. Control GeneMark Genome without frame shifts 417 overlaps FSMark-GM 118 frame shifts GATech

  9. Experiment 171 overlaps caused by frame shift GeneMark Genome with frame shifts in 400 genes 599 overlaps FSMark-GM 325 frame shifts GATech

  10. Questions to answer • Take a look at the distribution of overlap lengths in GeneMark output • Understand why GeneMark predicts gene overlap for less than 50% of genes with Frame Shifts. There are two possible reasons: • Missing short part, i.e. GeneMark predicts one gene only • GeneMark predicts two genes but they don’t overlap • Try to understand why did we get more False Positive in experiment than in control GATech

  11. All overlaps length (genome without FS) GATech

  12. Overlaps caused by frame shift GATech

  13. GeneMark analysis • Why does GeneMark barely predict overlaps for genes with frame shift? • In my GeneMark output there are 357 typical genes (out of 400). • Probably I use wrong GeneMark option? GATech

  14. GeneMark output statistics Genome with frame shifts in 400 genes 599 gene overlaps 4,388 genes 171 overlaps caused by fs 22 genes with fs are missing 335 genes with fs fs in 164 genes didn’t cause overlap 163 decreased their lengths 4 fs caused new gene downstream the initial gene GATech

  15. Conclusions • I need to check how to run GeneMark in order to get the same 400 typical genes • It seems that the small chunk in the shifted frame is not enough for GeneMark to predict a new gene GATech

  16. Time Table GATech

More Related