1 / 19

STOCHASTIC CONTEXT FREE GRAMMAR

STOCHASTIC CONTEXT FREE GRAMMAR. PARSING & USE. OUTLINE. Introduction to Stochastic Context Free Grammar(SCFG) Parsing of SCFG Use to RNA secondary structure prediction. SCFG. Chomsky hierarchy:. CONTEXT FREE GRAMMAR It’s a triple where: ∑ = set of terminal symbols(alphabet)

Télécharger la présentation

STOCHASTIC CONTEXT FREE GRAMMAR

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. STOCHASTIC CONTEXT FREE GRAMMAR PARSING & USE

  2. OUTLINE • Introduction to Stochastic Context Free Grammar(SCFG) • Parsing of SCFG • Use to RNA secondary structure prediction

  3. SCFG Chomsky hierarchy: • CONTEXT FREE GRAMMAR • It’s a triple where: • ∑ = set of terminal symbols(alphabet) • V = set of non terminal symbols • R = set of production rules in the form: • S=special start symbol and ∑ П V=Φ A string can be derived from another string ( ) if: and the production is a production of the grammar.

  4. SCFG A Stochastic Context Free Grammar is a quadruple G=(∑,V,R,P): Probability function constraint Def.: Let G=(∑,V,R,P) a SCFG and a derivation sequence d, where is a string of non terminal symbols, the probability of the derivation d is:

  5. , SCFG • Grammar can be ambiguous • Def. : The probability of SCFG G that produce the string s, is: , where are the derivation sequences that produces s.

  6. SCFG • Chomsky Normal Form(CNF) • Def.: A CFG(or SCFG) is in CNF if all the rules are in this form: B and C non terminal symbol αis a single terminal symbol

  7. SCFG - Parsing • Parsing process sequence Parser (synctacticanalyzer) Parse tree Give a sequence and a grammar, which is the best parse tree that generate the sequence, wath is which is the parse tree with the highest probability? CYK algorithm

  8. SCFG - Parsing • CYK algorithm (Cocke-Younger-Kasami) • High usedfor NLP(NaturalLanguage Processing) • Dynamicprogramming • Work with SCFG in CNF

  9. SCFG – Parsing • Input: SCFG G in CNF and word s. • Data Structure: dynamic programming 3-D arrray holds the maximum probability for a constituent with non terminal a spanning words i…j. Back-pointers to construct the parse tree. • Output: maximum probability parse.

  10. SCFG - Parsing • Initialization: n = length of ,R = number of nonterminals in G. Table P[n,n,R] = 0 // set all values in table to 0. Triples G[n,n,R] = triples of (position,nonterminal1,nonterminal2). //traceback pointers For j = 1 to n do for all unit productions of type do if s[j] == then set P[j,1,V] = Pv() // the probability of the production set G[j,1,V] = new Triple(0,0,0) // indicates no further traceback - i.e. a child node end if end for end for

  11. SCFG - Parsing • Mainloop: //i is the length of the span, j the start and k where to split into two subspans for i = 2 to n do for j = 1 to n-i+1 do for k = 1 to i-1 do for all productions of type do set newprob = P[j, k, X] *P[j + k, i – k, Y ]*pv(XY ) if newprob > P[j, i, V ] then set P[j, i, V ] = newprob set G[j, i, V] = new Triple(k,X,Y) // new traceback // point end if end for end for end for end for P[1][n][Start symbol in G] holds the probability of the most likely parse.

  12. SCFG - Parsing • Memory cost: O(n^2*M) • Time cost: O(n^3*T) n=length of the input string M=number of non terminal symbols T=number of production rules in the type V-->XY

  13. SCFG - Use • RNA primary structure: a nucleotide sequence constituent the mulecule, represented with a single string of the {a,c,g,u} alphabet • RNA secondary structure: refer to the retreat of the sequence(that is the primary structure) to her-self, due to the action of hydrogen link.

  14. SCFG - Use Stem & loop

  15. SCFG - Use • The secondarystructureof RNA isimportantbecause: • RNA “preserve” thisstructureduring the time • It’s common findsimilar RNA thathave the similarsecondarystructure, butdifferntsequenceofnucleotides • Evolutionof the RNA “follow” hisstructure Sequenceanalysisof RNA is more difficultthan DNA and otherproteins

  16. SCFG - Use • Problem: - Prediction of RNA secondary structure for a single sequence? Analogy with SCFG Calculate the most likely “parse tree” that derive a string

  17. SCFG - Use • Simple grammar for RNA: • S -> gSc | cSg | aSu | uSa | ε (complementary couples) • S -> aS | cS | gS | uS (left single basis) • S -> Sa | Sc | Sg | Su (right single basis) • S -> a | c | g | u (single basis) • S -> SS (fork)

  18. SCFG - Use Nucleotides sequence String RNA secondary structure Parse tree

  19. THANK YOU!

More Related