1 / 19

Estimation du likelihood pour un arbre phylogénétique fixé

Estimation du likelihood pour un arbre phylogénétique fixé. Étudiants: Nicolo.DeCoi@unil.ch Larisa.Espinoza@unil.ch Josip.Mikulic@unil.ch. Professeur : Salamin Nicolas Assistante : Maryam Zaheri. Buts principaux. Likelihood (L) estimation for a simple tree of nucleotides sequences

tyme
Télécharger la présentation

Estimation du likelihood pour un arbre phylogénétique fixé

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Estimation du likelihood pour un arbre phylogénétique fixé Étudiants: Nicolo.DeCoi@unil.ch Larisa.Espinoza@unil.ch Josip.Mikulic@unil.ch Professeur:Salamin Nicolas Assistante:Maryam Zaheri

  2. Buts principaux • Likelihood (L) estimation for a simple tree of nucleotides sequences • Understand the mathematicals processes under this estimation • Programming a general fonction in R to calculate L for a specific tree

  3. Intérêt et pertinence du projet • Utilisation et interprétation des données phylogénétiques • Etude et modélisation de l’évolution des espèces à partir des séquences trouvées • La méthode du ML permet d’optimiser les paramètres pour trouver le meilleur arbre phylogénétique • Détecter la pression sélective pour étudier le rôle d’un gène, les gènes importants sont très conservés

  4. Méthodologie • la distance entre 2 séquences s’estime par le nombre de nucléotides différents à chaque site • Problème: sous estimation des substitutions (substitutions multiples, hidden substitutions) • Markov chain (fig. 1) • Continuous-time Markov process: est un modèle probabilistique qui décrit tous les possibles substitutions qui peuvent se passer à un site on se basant sur la situation actuelle (markov property) et on assumant que chaque site évolue indépendamment Figure 1: représentation des substitutions multiples à un site (Yang, Computational molecular evolution, Oxford universtiy press, 2006)

  5. Markov chain model JC69: • Modèle simple pour nucléotides (T,C,A,G), bon modèle pour des petites distances • Q={qij} est la matrice des taux de substitution: • qij est le taux instantané de substitution du nucléotide i au nucléotide j • la somme de chaque colonne vaut 0 •  a la même valeur pour chaque nucléotide (ex =1) • qij∆t donne la probabilité de substitution dans un petit intervalle de temps • P(t)={pij(t)} est la matrice des probabilités de transition: • pij(t) est la probabilité qu’un nucléotide i devient j après un temps t • cette matrice considère et calcule tous les possible voies par lesquelles est passé le processus évolutif • même dimensions de la matrice Q, la somme de chaque colonne vaut 1 • se calcule ainsi:

  6. Likelihood method: • méthode pour calculer la distance entre 2 séquences • la topologie de l’arbre est fixé • on assume que chaque site évolue indépendamment des autres • the nesting rule : on somme les états du nœud ancestrale après avoir fait la somme pour chaque nœud descendant • utilise une matrice data de dimension s x n Pour un nœud: Pour un site de l’arbre: Pour tous les sites de l’arbre:

  7. Résultats Topologie de l’arbre phylogénétique • The Newick tree format est une écriture utilisé pour représenter des arbres tr <- read.tree() (((S1:0.2,S2:0.2)node1:0.1,S3:0.2)node3:0.1,(S4:0.2,S5:0.2)node2:0.1)node4:0.1; plot.phylo(tr,root.edge=TRUE,underscore=TRUE,no.margin=TRUE,font=2)nodelabels(c("node4","node3","node1","node2"),frame="c",bg="white")

  8. site <- function(seq,Q){ dna<-c(-3,1,1,1,1,-3,1,1,1,1,-3,1,1,1,1,-3) Q<-matrix(dna,nrow=4,ncol=4,dimnames=list("bases"=c("T","C","A","G"), "bases"=c("T","C","A","G"))) seq <- read.dna("/Users/Deketek/Scuola/Mathematique/mc.paml", format="sequential",as.character=TRUE) L <- numeric(length=length(seq[1,])) t1 <- 0.2 t2 <- 0.2 t3 <- 0.2 t4 <- 0.2 t5 <- 0.2 t6 <- 0.1 t7 <- 0.1 t8 <- 0.1 p <- c(0.25,0.25,0.25,0.25) x <- matrix(data=NA,nrow=5,ncol=4) for(hin1:length(seq[1,])){ for(kin1:length(seq[,1])){ if(seq[k,h]=="t"){ x[k,] <- c(1,0,0,0)} else{if(seq[k,h]=="c"){ x[k,] <- c(0,1,0,0)} else{if(seq[k,h]=="a"){ x[k,] <- c(0,0,1,0)} else{if(seq[k,h]=="g"){ x[k,] <- c(0,0,0,1) }}}}} P1<-expm(Q*t1) l <- length(P1[1,]) c <- length(P1[,1]) sum1 <- c(0,0,0,0) C1 <- matrix(data=0,nrow=l,ncol=c) for(iin1:l){ for(jin1:c){ C1[i,j] <- P1[i,j]*x[1,j] sum1[i] <- C1[i,j]+sum1[i]}} P2<-expm(Q*t2) l <- length(P2[1,]) c <- length(P2[,1]) sum2 <- c(0,0,0,0) C2 <- matrix(data=0,nrow=l,ncol=c) for(iin1:l){ for(jin1:c){ C2[i,j] <- P2[i,j]*x[2,j] sum2[i] <- C2[i,j]+sum2[i]}} node1 <- sum1*sum2 P3<-expm(Q*t4) l <- length(P1[1,]) c <- length(P1[,1]) sum3 <- c(0,0,0,0) C3 <- matrix(data=NA,nrow=l,ncol=c) for(iin1:l){ for(jin1:c){ C3[i,j] <- P3[i,j]*x[4,j] sum3[i] <- C3[i,j]+sum3[i]}} P4<-expm(Q*t5) l <- length(P2[1,]) c <- length(P2[,1]) um4 <- c(0,0,0,0) C4 <- matrix(data=NA,nrow=l,ncol=c) for(iin1:l){ for(jin1:c){ C4[i,j] <- P4[i,j]*x[5,j] sum4[i] <- C4[i,j]+sum4[i]}} node2 <- sum3*sum4 P5<-expm(Q*t7) l <- length(P1[1,]) c <- length(P1[,1]) sum5 <- c(0,0,0,0) C5 <- matrix(data=NA,nrow=l,ncol=c) for(iin1:l){ for(jin1:c){ C5[i,j] <- P5[i,j]*node1[j] sum5[i] <- C5[i,j]+sum5[i]}} P6<-expm(Q*t2) l <- length(P2[1,]) c <- length(P2[,1]) sum6 <- c(0,0,0,0) C6 <- matrix(data=NA,nrow=l,ncol=c) for(iin1:l){ for(jin1:c){ C6[i,j] <- P6[i,j]*x[3,j] sum6[i] <- C6[i,j]+sum6[i]}} node3 <- sum5*sum6 P7<-expm(Q*t6) l <- length(P1[1,]) c <- length(P1[,1]) sum7 <- c(0,0,0,0) C7 <- matrix(data=NA,nrow=l,ncol=c) for(iin1:l){ for(jin1:c){ C7[i,j] <- P7[i,j]*node3[j] sum7[i] <- C7[i,j]+sum7[i]}} P8<-expm(Q*t8) l <- length(P2[1,]) c <- length(P2[,1]) sum8 <- c(0,0,0,0) C8 <- matrix(data=NA,nrow=l,ncol=c) for(iin1:l){ for(jin1:c){ C8[i,j] <- P8[i,j]*node2[j] sum8[i] <- C8[i,j]+sum8[i]}} node4 <- sum7*sum8 L[h]<- log(sum(node4*p)) } LT <- sum(L) return(LT) }

  9. site <- function(seq,Q){ dna<-c(-3,1,1,1,1,-3,1,1,1,1,-3,1,1,1,1,-3) Q<-matrix(dna,nrow=4,ncol=4,dimnames=list("bases"=c("T","C","A","G"), "bases"=c("T","C","A","G"))) seq <- read.dna("/Users/Deketek/Scuola/Mathematique/mc.paml", format="sequential",as.character=TRUE) L <- numeric(length=length(seq[1,])) t1 <- 0.2 t2 <- 0.2 t3 <- 0.2 t4 <- 0.2 t5 <- 0.2 t6 <- 0.1 t7 <- 0.1 t8 <- 0.1 p <- c(0.25,0.25,0.25,0.25) x <- matrix(data=NA,nrow=5,ncol=4) for(hin1:length(seq[1,])){ for(kin1:length(seq[,1])){ if(seq[k,h]=="t"){ x[k,] <- c(1,0,0,0)} else{if(seq[k,h]=="c"){ x[k,] <- c(0,1,0,0)} else{if(seq[k,h]=="a"){ x[k,] <- c(0,0,1,0)} else{if(seq[k,h]=="g"){ x[k,] <- c(0,0,0,1) }}}}} P1<-expm(Q*t1) l <- length(P1[1,]) c <- length(P1[,1]) sum1 <- c(0,0,0,0) C1 <- matrix(data=0,nrow=l,ncol=c) for(iin1:l){ for(jin1:c){ C1[i,j] <- P1[i,j]*x[1,j] sum1[i] <- C1[i,j]+sum1[i]}} P2<-expm(Q*t2) l <- length(P2[1,]) c <- length(P2[,1]) sum2 <- c(0,0,0,0) C2 <- matrix(data=0,nrow=l,ncol=c) for(iin1:l){ for(jin1:c){ C2[i,j] <- P2[i,j]*x[2,j] sum2[i] <- C2[i,j]+sum2[i]}} node1 <- sum1*sum2 P3<-expm(Q*t4) l <- length(P1[1,]) c <- length(P1[,1]) sum3 <- c(0,0,0,0) C3 <- matrix(data=NA,nrow=l,ncol=c) for(iin1:l){ for(jin1:c){ C3[i,j] <- P3[i,j]*x[4,j] sum3[i] <- C3[i,j]+sum3[i]}} P4<-expm(Q*t5) l <- length(P2[1,]) c <- length(P2[,1]) um4 <- c(0,0,0,0) C4 <- matrix(data=NA,nrow=l,ncol=c) for(iin1:l){ for(jin1:c){ C4[i,j] <- P4[i,j]*x[5,j] sum4[i] <- C4[i,j]+sum4[i]}} node2 <- sum3*sum4 P5<-expm(Q*t7) l <- length(P1[1,]) c <- length(P1[,1]) sum5 <- c(0,0,0,0) C5 <- matrix(data=NA,nrow=l,ncol=c) for(iin1:l){ for(jin1:c){ C5[i,j] <- P5[i,j]*node1[j] sum5[i] <- C5[i,j]+sum5[i]}} P6<-expm(Q*t2) l <- length(P2[1,]) c <- length(P2[,1]) sum6 <- c(0,0,0,0) C6 <- matrix(data=NA,nrow=l,ncol=c) for(iin1:l){ for(jin1:c){ C6[i,j] <- P6[i,j]*x[3,j] sum6[i] <- C6[i,j]+sum6[i]}} node3 <- sum5*sum6 P7<-expm(Q*t6) l <- length(P1[1,]) c <- length(P1[,1]) sum7 <- c(0,0,0,0) C7 <- matrix(data=NA,nrow=l,ncol=c) for(iin1:l){ for(jin1:c){ C7[i,j] <- P7[i,j]*node3[j] sum7[i] <- C7[i,j]+sum7[i]}} P8<-expm(Q*t8) l <- length(P2[1,]) c <- length(P2[,1]) sum8 <- c(0,0,0,0) C8 <- matrix(data=NA,nrow=l,ncol=c) for(iin1:l){ for(jin1:c){ C8[i,j] <- P8[i,j]*node2[j] sum8[i] <- C8[i,j]+sum8[i]}} node4 <- sum7*sum8 L[h]<- log(sum(node4*p)) } LT <- sum(L) return(LT) }

  10. La matrice Q Les séquences à analyser > Q bases bases T C A G T -3 1 1 1 C 1 -3 1 1 A 1 1 -3 1 G 1 1 1 -3 > seq[,1:5] [,1] [,2] [,3] [,4] [,5] S1 "t" "g" "t" "a" "g" S2 "t" "t" "g" "g" "c" S3 "g" "t" "g" "t" "t" S4 "t" "g" "t" "a" "g" S5 "a" "g" "g" "c" "a" Résultats Les inputs de notre fonction: site <- function(seq,Q){

  11. site <- function(seq,Q){ dna<-c(-3,1,1,1,1,-3,1,1,1,1,-3,1,1,1,1,-3) Q<matrix(dna,nrow=4,ncol=4,dimnames=list("bases"=c("T","C","A","G"), "bases"=c("T","C","A","G"))) seq <- read.dna("/Users/Deketek/Scuola/Mathematique/mc.paml", format="sequential",as.character=TRUE) L <- numeric(length=length(seq[1,])) t1 <- 0.2 t2 <- 0.2 t3 <- 0.2 t4 <- 0.2 t5 <- 0.2 t6 <- 0.1 t7 <- 0.1 t8 <- 0.1 p <- c(0.25,0.25,0.25,0.25) x <- matrix(data=NA,nrow=5,ncol=4) for(hin1:length(seq[1,])){ for(kin1:length(seq[,1])){ if(seq[k,h]=="t"){ x[k,] <- c(1,0,0,0)} else{if(seq[k,h]=="c"){ x[k,] <- c(0,1,0,0)} else{if(seq[k,h]=="a"){ x[k,] <- c(0,0,1,0)} else{if(seq[k,h]=="g"){ x[k,] <- c(0,0,0,1) }}}}} P1<-expm(Q*t1) l <- length(P1[1,]) c <- length(P1[,1]) sum1 <- c(0,0,0,0) C1 <- matrix(data=0,nrow=l,ncol=c) for(iin1:l){ for(jin1:c){ C1[i,j] <- P1[i,j]*x[1,j] sum1[i] <- C1[i,j]+sum1[i]}} P2<-expm(Q*t2) l <- length(P2[1,]) c <- length(P2[,1]) sum2 <- c(0,0,0,0) C2 <- matrix(data=0,nrow=l,ncol=c) for(iin1:l){ for(jin1:c){ C2[i,j] <- P2[i,j]*x[2,j] sum2[i] <- C2[i,j]+sum2[i]}} node1 <- sum1*sum2 P3<-expm(Q*t4) l <- length(P1[1,]) c <- length(P1[,1]) sum3 <- c(0,0,0,0) C3 <- matrix(data=NA,nrow=l,ncol=c) for(iin1:l){ for(jin1:c){ C3[i,j] <- P3[i,j]*x[4,j] sum3[i] <- C3[i,j]+sum3[i]}} P4<-expm(Q*t5) l <- length(P2[1,]) c <- length(P2[,1]) um4 <- c(0,0,0,0) C4 <- matrix(data=NA,nrow=l,ncol=c) for(iin1:l){ for(jin1:c){ C4[i,j] <- P4[i,j]*x[5,j] sum4[i] <- C4[i,j]+sum4[i]}} node2 <- sum3*sum4 P5<-expm(Q*t7) l <- length(P1[1,]) c <- length(P1[,1]) sum5 <- c(0,0,0,0) C5 <- matrix(data=NA,nrow=l,ncol=c) for(iin1:l){ for(jin1:c){ C5[i,j] <- P5[i,j]*node1[j] sum5[i] <- C5[i,j]+sum5[i]}} P6<-expm(Q*t2) l <- length(P2[1,]) c <- length(P2[,1]) sum6 <- c(0,0,0,0) C6 <- matrix(data=NA,nrow=l,ncol=c) for(iin1:l){ for(jin1:c){ C6[i,j] <- P6[i,j]*x[3,j] sum6[i] <- C6[i,j]+sum6[i]}} node3 <- sum5*sum6 P7<-expm(Q*t6) l <- length(P1[1,]) c <- length(P1[,1]) sum7 <- c(0,0,0,0) C7 <- matrix(data=NA,nrow=l,ncol=c) for(iin1:l){ for(jin1:c){ C7[i,j] <- P7[i,j]*node3[j] sum7[i] <- C7[i,j]+sum7[i]}} P8<-expm(Q*t8) l <- length(P2[1,]) c <- length(P2[,1]) sum8 <- c(0,0,0,0) C8 <- matrix(data=NA,nrow=l,ncol=c) for(iin1:l){ for(jin1:c){ C8[i,j] <- P8[i,j]*node2[j] sum8[i] <- C8[i,j]+sum8[i]}} node4 <- sum7*sum8 L[h]<- log(sum(node4*p)) } LT <- sum(L) return(LT) }

  12. Résultats • x <- matrix(data=NA,nrow=5,ncol=4) • for(h in 1:length(seq[1,])){ • for(k in 1:length(seq[,1])){ • if(seq[k,h]=="t"){ • x[k,] <- c(1,0,0,0) • } • else{ • if(seq[k,h]=="c"){ • x[k,] <- c(0,1,0,0) • } • else{ • if(seq[k,h]=="a"){ • x[k,] <- c(0,0,1,0) • } • else{ • if(seq[k,h]=="g"){ • x[k,] <- c(0,0,0,1) • }}}}} > xsite1 bases seq T C A G S1 1 0 0 0 S2 0 0 1 0 S3 0 0 1 0 S4 0 0 0 1 S5 1 0 0 0 

  13. site <- function(seq,Q){ dna<-c(-3,1,1,1,1,-3,1,1,1,1,-3,1,1,1,1,-3) Q<matrix(dna,nrow=4,ncol=4,dimnames=list("bases"=c("T","C","A","G"), "bases"=c("T","C","A","G"))) seq <- read.dna("/Users/Deketek/Scuola/Mathematique/mc.paml", format="sequential",as.character=TRUE) L <- numeric(length=length(seq[1,])) t1 <- 0.2 t2 <- 0.2 t3 <- 0.2 t4 <- 0.2 t5 <- 0.2 t6 <- 0.1 t7 <- 0.1 t8 <- 0.1 p <- c(0.25,0.25,0.25,0.25) x <- matrix(data=NA,nrow=5,ncol=4) for(hin1:length(seq[1,])){ for(kin1:length(seq[,1])){ if(seq[k,h]=="t"){ x[k,] <- c(1,0,0,0)} else{if(seq[k,h]=="c"){ x[k,] <- c(0,1,0,0)} else{if(seq[k,h]=="a"){ x[k,] <- c(0,0,1,0)} else{if(seq[k,h]=="g"){ x[k,] <- c(0,0,0,1) }}}}} P1<-expm(Q*t1) l <- length(P1[1,]) c <- length(P1[,1]) sum1 <- c(0,0,0,0) C1 <- matrix(data=0,nrow=l,ncol=c) for(iin1:l){ for(jin1:c){ C1[i,j] <- P1[i,j]*x[1,j] sum1[i] <- C1[i,j]+sum1[i]}} P2<-expm(Q*t2) l <- length(P2[1,]) c <- length(P2[,1]) sum2 <- c(0,0,0,0) C2 <- matrix(data=0,nrow=l,ncol=c) for(iin1:l){ for(jin1:c){ C2[i,j] <- P2[i,j]*x[2,j] sum2[i] <- C2[i,j]+sum2[i]}} node1 <- sum1*sum2 P3<-expm(Q*t4) l <- length(P1[1,]) c <- length(P1[,1]) sum3 <- c(0,0,0,0) C3 <- matrix(data=NA,nrow=l,ncol=c) for(iin1:l){ for(jin1:c){ C3[i,j] <- P3[i,j]*x[4,j] sum3[i] <- C3[i,j]+sum3[i]}} P4<-expm(Q*t5) l <- length(P2[1,]) c <- length(P2[,1]) um4 <- c(0,0,0,0) C4 <- matrix(data=NA,nrow=l,ncol=c) for(iin1:l){ for(jin1:c){ C4[i,j] <- P4[i,j]*x[5,j] sum4[i] <- C4[i,j]+sum4[i]}} node2 <- sum3*sum4 P5<-expm(Q*t7) l <- length(P1[1,]) c <- length(P1[,1]) sum5 <- c(0,0,0,0) C5 <- matrix(data=NA,nrow=l,ncol=c) for(iin1:l){ for(jin1:c){ C5[i,j] <- P5[i,j]*node1[j] sum5[i] <- C5[i,j]+sum5[i]}} P6<-expm(Q*t2) l <- length(P2[1,]) c <- length(P2[,1]) sum6 <- c(0,0,0,0) C6 <- matrix(data=NA,nrow=l,ncol=c) for(iin1:l){ for(jin1:c){ C6[i,j] <- P6[i,j]*x[3,j] sum6[i] <- C6[i,j]+sum6[i]}} node3 <- sum5*sum6 P7<-expm(Q*t6) l <- length(P1[1,]) c <- length(P1[,1]) sum7 <- c(0,0,0,0) C7 <- matrix(data=NA,nrow=l,ncol=c) for(iin1:l){ for(jin1:c){ C7[i,j] <- P7[i,j]*node3[j] sum7[i] <- C7[i,j]+sum7[i]}} P8<-expm(Q*t8) l <- length(P2[1,]) c <- length(P2[,1]) sum8 <- c(0,0,0,0) C8 <- matrix(data=NA,nrow=l,ncol=c) for(iin1:l){ for(jin1:c){ C8[i,j] <- P8[i,j]*node2[j] sum8[i] <- C8[i,j]+sum8[i]}} node4 <- sum7*sum8 L[h]<- log(sum(node4*p)) } LT <- sum(L) return(LT) }

  14. Node 1 sum2 sum1 X t2 t1 P1<-expm(Q*t1) sum1 <- c(0,0,0,0) l <- length(P1[1,]) c <- length(P1[,1]) C1 <- matrix(data=NA,nrow=l,ncol=c) for(iin1:l){ for(jin1:c){ C1[i,j] <- P1[i,j]*x[1,j] sum1[i] <- C1[i,j]+sum1[i] }} P2<-expm(Q*t2) sum2 <- c(0,0,0,0) l <- length(P2[1,]) c <- length(P2[,1]) C2 <- matrix(data=NA,nrow=l,ncol=c) for(iin1:l){ for(jin1:c){ C2[i,j] <- P2[i,j]*x[2,j] sum2[i] <- C2[i,j]+sum2[i] }} node1 <- sum1*sum2 X[1,j] X[2,j] Exemple avec t = 0.2 > P 4 x 4 Matrix of class "dsyMatrix" bases bases T C A G T 0.5869967 0.1376678 0.1376678 0.1376678 C 0.1376678 0.5869967 0.1376678 0.1376678 A 0.1376678 0.1376678 0.5869967 0.1376678 G 0.1376678 0.1376678 0.1376678 0.5869967 Résultats Exemple pour le premier noeud

  15. site <- function(seq,Q){ dna<-c(-3,1,1,1,1,-3,1,1,1,1,-3,1,1,1,1,-3) Q<matrix(dna,nrow=4,ncol=4,dimnames=list("bases"=c("T","C","A","G"), "bases"=c("T","C","A","G"))) seq <- read.dna("/Users/Deketek/Scuola/Mathematique/mc.paml", format="sequential",as.character=TRUE) L <- numeric(length=length(seq[1,])) t1 <- 0.2 t2 <- 0.2 t3 <- 0.2 t4 <- 0.2 t5 <- 0.2 t6 <- 0.1 t7 <- 0.1 t8 <- 0.1 p <- c(0.25,0.25,0.25,0.25) x <- matrix(data=NA,nrow=5,ncol=4) for(hin1:length(seq[1,])){ for(kin1:length(seq[,1])){ if(seq[k,h]=="t"){ x[k,] <- c(1,0,0,0)} else{if(seq[k,h]=="c"){ x[k,] <- c(0,1,0,0)} else{if(seq[k,h]=="a"){ x[k,] <- c(0,0,1,0)} else{if(seq[k,h]=="g"){ x[k,] <- c(0,0,0,1) }}}}} P1<-expm(Q*t1) l <- length(P1[1,]) c <- length(P1[,1]) sum1 <- c(0,0,0,0) C1 <- matrix(data=0,nrow=l,ncol=c) for(iin1:l){ for(jin1:c){ C1[i,j] <- P1[i,j]*x[1,j] sum1[i] <- C1[i,j]+sum1[i]}} P2<-expm(Q*t2) l <- length(P2[1,]) c <- length(P2[,1]) sum2 <- c(0,0,0,0) C2 <- matrix(data=0,nrow=l,ncol=c) for(iin1:l){ for(jin1:c){ C2[i,j] <- P2[i,j]*x[2,j] sum2[i] <- C2[i,j]+sum2[i]}} node1 <- sum1*sum2 P3<-expm(Q*t4) l <- length(P1[1,]) c <- length(P1[,1]) sum3 <- c(0,0,0,0) C3 <- matrix(data=NA,nrow=l,ncol=c) for(iin1:l){ for(jin1:c){ C3[i,j] <- P3[i,j]*x[4,j] sum3[i] <- C3[i,j]+sum3[i]}} P4<-expm(Q*t5) l <- length(P2[1,]) c <- length(P2[,1]) um4 <- c(0,0,0,0) C4 <- matrix(data=NA,nrow=l,ncol=c) for(iin1:l){ for(jin1:c){ C4[i,j] <- P4[i,j]*x[5,j] sum4[i] <- C4[i,j]+sum4[i]}} node2 <- sum3*sum4 P5<-expm(Q*t7) l <- length(P1[1,]) c <- length(P1[,1]) sum5 <- c(0,0,0,0) C5 <- matrix(data=NA,nrow=l,ncol=c) for(iin1:l){ for(jin1:c){ C5[i,j] <- P5[i,j]*node1[j] sum5[i] <- C5[i,j]+sum5[i]}} P6<-expm(Q*t2) l <- length(P2[1,]) c <- length(P2[,1]) sum6 <- c(0,0,0,0) C6 <- matrix(data=NA,nrow=l,ncol=c) for(iin1:l){ for(jin1:c){ C6[i,j] <- P6[i,j]*x[3,j] sum6[i] <- C6[i,j]+sum6[i]}} node3 <- sum5*sum6 P7<-expm(Q*t6) l <- length(P1[1,]) c <- length(P1[,1]) sum7 <- c(0,0,0,0) C7 <- matrix(data=NA,nrow=l,ncol=c) for(iin1:l){ for(jin1:c){ C7[i,j] <- P7[i,j]*node3[j] sum7[i] <- C7[i,j]+sum7[i]}} P8<-expm(Q*t8) l <- length(P2[1,]) c <- length(P2[,1]) sum8 <- c(0,0,0,0) C8 <- matrix(data=NA,nrow=l,ncol=c) for(iin1:l){ for(jin1:c){ C8[i,j] <- P8[i,j]*node2[j] sum8[i] <- C8[i,j]+sum8[i]}} node4 <- sum7*sum8 L[h]<- log(sum(node4*p)) } LT <- sum(L) return(LT) }

  16. Résultats Vecteur vide à remplir avec le likelihood pour chaque site L <- numeric(length=length(seq[1,])) Loop pour calculer le likelihood de chaque site for(hin1:length(seq[1,])){ Définir la valeur qui entre dans le vecteur L et l’output de la fonction L[h]<- log(sum(node4*p)) } LT <- sum(L) return(LT) }

  17. Perspectives • Optimisation des paramètres de l’arbre • Passage du modèle JC69 à des modèles pour nucléotides plus complexes • Passage au Codon-based model: • La chaîne de Markov décrit les substitutions d’un codon i en j • Ne considère pas les délétions/insertions • Les mutations se passent indépendamment au 3 sites des codons • Les paramètres de Q: {qij} = • 0 si i et j diffèrent à 2 ou 3 positions du codon • jsi i et j diffèrent d’une transversion synonyme • jsi i et j diffèrent d’une transition synonyme • jsi i et j diffèrent d’une transversion non-synonyme • jsi i et j diffèrent d’une transition non-synonyme • Estimation du taux de substitutions nonsynonyme/ synonyme (ω= dn/ds) est un important indicateur de la pression sélective: • ω = 1 -> neutral mutations • ω < 1 -> purifyng selection • ω > 1 -> diversifyng positive selection

  18. Bibliographie Livres: Yang Z., Computational Molecular Evolution, Oxford university press, 2006 Paradis E., Analysis of Phylogenetics and Evolution with R, Springer, 2006 Masatoshi N./Sudhir K., Molecular Evolution and Phylogenetics, Oxford university press, 2000 Articles: • Goldman N./Yang Z., "A Codon-based Model of Nucleotide Substitution for Protein-coding DNA Sequences", The University of Chicago, 1994 • Yang & all,  "Codon-Substitution Models for Heterogeneous Selection Pressure at Amino Acid Sites", Genetics nb 155, p. 431-449, 2000

  19. Merci pour l’attention!! Questions?

More Related