1 / 21

FISH Fast Identification of Segmental Homology

FISH Fast Identification of Segmental Homology. University of North Carolina at Chapel Hill. Shian-Gro Wu. Department of Computer Science and Information Engineering, National Taiwan University. Outline. Introduction Input data How it works From markers to features Form features to grid

mandel
Télécharger la présentation

FISH Fast Identification of Segmental Homology

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. FISHFast Identification of Segmental Homology University of North Carolina at Chapel Hill Shian-Gro Wu Department of Computer Science and Information Engineering, National Taiwan University

  2. Outline • Introduction • Input data • How it works • From markers to features • Form features to grid • Form grid to bolcks

  3. Introduction • FISH is software for the fast identification and statistical evaluation of segmental homologs. genome gene(marker) contig

  4. Introduction markers features contigA contigA contigB contigB blocks contigA contigA points contigB contigB

  5. Input data • Each map file lists the names and transcriptional orientation (if known) of all the markers on one contig. • Example <map1> gene names transcriptional orientation At1g01010 1 At1g01020 -1 At1g01030 -1 At1g01040 1 At1g01050 -1 ... marker

  6. Input data • Each match file lists all the homologies between markers in a pair of contigs. • Example <match1v1> gene names gene namesmatch score At1g01010 At1g02240 94 At1g01010 At1g02250 91 At1g01010 At1g32870 66 At1g01010 At1g33060 43 At1g01010 At1g52880 42 ....

  7. From markers to features markers features contigA contigA contigB contigB blocks contigA contigA points contigB contigB

  8. From markers to features • step1 • positions and transcriptional orientations (when known) of the markers are read from a set of map files, one map file per contig. Markers within each map file must be ordered according to their physical positions on the contig. • Individual homologies between markers are read from a set of match files. There is at least one, and no more than two, such files for each pair of contigs. A,B,C  A&A,A&B,A&C,B&A,B&B………

  9. From markers to features • step2 • FISH performs detandemization, in which multiple markers may be collapsed into single features. • MIN Score and MAX Dist. markers a b c d e f g h features A B (B) C D (C) E F

  10. ScoreAB ScoreAB ScoreAB markA markA markA markB markB markB ScoreAC ScoreBC markC markA markB From markers to features 1.ScoreAB > MIN Score MAX Dist Range 2.ScoreAC > MIN Score and ScoreBC > MIN Score

  11. Form features to grid markers features contigA contigA contigB contigB blocks contigA contigA points contigB contigB

  12. Form features to grid • In order to identify segmental homologies, FISH computes a grid for each pair of contigs. • Pointsin the grid represent matches between pairs of features. fA1 fA2 fA3 fA4 contigA PointA1B2 PointB2A4 contigB fB1 fB2 fB3 fB4

  13. Form features to grid • Each position in the grid, whether or not a point is present, is called as a cell. cell (contigA,contigB) = feature (contigA) * feature (contigB) cell (contigC,contigC) = feature (contigC) * [feature (contigC) -1] / 2 A B C C

  14. Form features to grid • contig markers features 1 6494 5913 2 4038 3711 3 5221 4777 • contig1 contig2 points cells 1 1 2143 17478828 1 2 2018 21943144 1 3 2088 28246400 2 2 751 6883905 ….

  15. Form features to grid markers features contigA contigA contigB contigB blocks contigA contigA points contigB contigB

  16. Form grid to bolcks • Defining the neighborhood size • FISH measures distance between two points (Xi,Yi) and (Xj,Yj) using the Manhattan distance • In order to be considered neighbors, two points must be closer than m:number of points n:number of cells

  17. dT 23 If T=0.05 0.75 m/n 0.0001 0.05 Form grid to bolcks m:number of points n:number of cells

  18. Result

  19. Form grid to bolcks • Choosing among multiple neighbors • It can happen that a point may be in the neighborhood of more than one other point. • FISH ranks the cells within each neighborhood and chooses that neighbor having the highest rank Where n is the number of cells in the point’s neighborhood, dc is the distance of the cell from the point under consideration and w is the weight.

  20. Reference • User’s Manual for Fast Identification of Segmental Homology http://www.bio.unc.edu/faculty/vision/lab/FISH/ • Fast identification and statistical evaluation of segmental homologies in comparative maps http://bioinformatics.oxfordjournals.org/cgi/content/abstract/19/suppl_1/i74

  21. Thank You

More Related