170 likes | 459 Vues
Practical applications of HMMs : ChromHMM. Sushmita Roy Nov 5th. Chromatin organization and gene expression. http:// www.youtube.com/watch?v =eYrQ0EhVCYA. ChIP-seq to measure histone data. Adapted from Dewey lecture and Peter Park Nature Genetics Review.
E N D
Practical applications of HMMs: ChromHMM Sushmita Roy Nov 5th
Chromatin organization and gene expression http://www.youtube.com/watch?v=eYrQ0EhVCYA
ChIP-seq to measure histone data Adapted from Dewey lecture and Peter Park Nature Genetics Review
ChIP-seq data for multiple marks Chromatin state: A specific combinations of mark values. Important because it can be used to segment the genome into biologically meaningful units.
Problem definition • Given • A collection of genome-wide measurements of chromatin marks • Do • Segment the genome into N chromatin states
An HMM for segmenting genomes using chromatin marks • HMM • State: chromatin state • Emission->multiple chromatin marks • Need a multi-variate HMM
Binarizing the chromatin data • Each mark is represented by a binary variable vt,m: • 1: mark is present • 0: mark is absent Observed Marks .. Genomic sequence .. .. t t+1 t+2 t+3 ..
ChromHMM with 3 states Begin 1 3 2
ChromHMM notation • pk,mdenotes the probability of mark mbeing ON in state k • Emission probability of M marks per state is a product of M bernoulli random variables. • bk,l denotes the probability of transitioning from state i to state j • ak: initial probability of state k
Learning the ChromHMM • Need to figure out the number of states • Learn HMMs for K=2 to 80 states with a penalty factor to penalize the number of parameters • State transitions: start with the fully connected HMM, and if set parameters to zero if <10-10 • Final model had 51 states
Learned Emission parameters Emission parameters for state 5 States
Example output around CAPZA2 gene from ChromHMM Input chromatin marks Inferred state sequences
Posterior probability distributions of all 51 states around CAPZA gene Max posterior state Posterior probability values of each state
Summary • HMMs are powerful models to capture sequential data • Very popular in computational biology • Gene annotation • Representation of a profile: protein domain finding • Genome segmentation