1 / 30

Efficient kernels for sentence pair classification

Efficient kernels for sentence pair classification. Fabio Massimo Zanzotto and Lorenzo Dell’Arciprete University of Rome “Tor Vergata” Roma, Italy. Motivation. Classifying sentence pairs is an important activity in many NLP tasks , e.g.: Textual Entailment Recognition

demont
Télécharger la présentation

Efficient kernels for sentence pair classification

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Efficient kernels for sentence pair classification Fabio Massimo Zanzotto and Lorenzo Dell’Arciprete UniversityofRome “Tor Vergata” Roma, Italy

  2. Motivation • Classifyingsentencepairsisanimportantactivity in many NLP tasks, e.g.: • TextualEntailmentRecognition • MachineTranslation • Question-Answering • Classifiersneedsuitalblefeaturespaces

  3. P1: T1  H1 P2: T2  H2 P3: T3 H3 T3 T1 T2 “Farmersfeedcowsanimalextracts” “They feed dolphins fishs” “Mothersfeedbabies milk” H1 H2 H3 “Fishseatdolphins” “Cowseatanimalextracts” “Babieseat milk” Motivation Forexample, in textualentailment… Training examples RelevantFeatures feed eat X Y X Y First-orderrules Classification

  4. In thistalk… • First-orderrule (FOR) featurespaces: a challenge • Tripartite DirectedAcyclicGraphs (tDAG) as a solution: • formodelling FOR featurespaces • fordefiningefficientalgorithmsforcomputingkernelfunctionswithtDAGs in FOR featurespaces • An efficientalgorithmforcomputingkernels in FOR spaces • Experimental and comparative assessmentof the computationalefficiencyof the proposedalgorithm

  5. First-orderrule (FOR) featurespaces: challenges Wewantto exploit first-orderrule (FOR) featurespaceswriting the implicitkernelfunction K(P1,P2)=|S(P1)S(P2)| thatcomputeshowmany common first-orderrules are activatedfrom P1 and P2 Without loss ofgenerality, wepresent the problem in syntactic-first-orderrulefeaturespaces

  6. T1  H1 T1  H2 Observations • … using the KernelTrick: • define the distance K(P1 , P2) • insteadofdefining the feautures K(T1  H1,T1  H2)

  7. T1  H1 T1 “Farmersfeedcowsanimalextracts” H1 “Cowseatanimalextracts” First-orderrule (FOR) featurespaces: challenges Addingplaceholders Propagatingplaceholders S S NP VP 1 VP NP NNS VB NP , 3 1   Pa= VB NP 1 NP NNS 3 Cows eat NN NNS feed Farmers 2 3 NN NNS NNS 2 3 1 1 animal extracts cows animal extracts 2 3 1 2 3 S S S { } , , VP     , ,... S(Pa)= NP VP 1 NP VP NP VP 1 VB NP 1 NP 3 VB NP 3 feed eat

  8. T3 H3 T3 “Mothersfeedbabies milk” H3 “Babieseat milk” First-orderrule (FOR) featurespaces: challenges S S NP VP 1 VP NP NP NNS VB 2 , 1   VB NP 1 NP NNS 2 Babies eat NN 2 feed Mothers NN NNS 2 1 1 milk babies milk Pb= 2 1 2 S S S { } , , VP     , ,... NP VP 1 NP VP NP VP 1 VB NP 1 NP 2 VB NP S(Pb)= 2 feed eat

  9. First-orderrule (FOR) featurespaces: challenges K(Pa,Pb)=|S(Pa)S(Pb)| S S S { } , , VP S(Pa)=     , ,... NP VP 1 NP VP NP VP 1 VB NP 1 NP 3 VB NP 3 feed VP S eat ,   NP VP VB NP NP X X Y = = VB NP feed Y = eat S S S { } , , VP S(Pb)=     , ,... NP VP 1 NP VP NP VP 1 VB NP 1 NP 2 VB NP 2 feed eat

  10. A stepback… • FOR featurespaces can bemodelledwithparticulargraphs • Wecallthesegraphs tripartite directacyclicgraphs (tDAGs) • Observations: • tDAGs are nottrees • tDAGs can beusedtomodelbothrules and sentencepairs • unifyingrules in sentencesis a graphmatchingproblem

  11. Tripartite DirectedAcyclicGraphs (tDAG) As forFeatureStructures… VP S NP VP VB NP NP X X Y VB NP Y feed eat S S NP VP 1 VP NP NNS VB NP 3 1 VB NP 1 NP NNS 3 Cows eat NN NNS feed Farmers 2 3 NN NNS NNS 2 3 1 1 animal extracts cows animal extracts 2 3 1 2 3

  12. Tripartite DirectedAcyclicGraphs (tDAG) As forFeatureStructures… VP S NP VP VB NP NP X X Y VB NP Y feed eat S S NP VP 1 VP NP NNS VB NP 3 1 VB NP 1 NP NNS 3 Cows eat NN NNS feed Farmers 2 3 NN NNS NNS 2 3 1 1 animal extracts cows animal extracts 2 3 1 2 3

  13. Tripartite DirectedAcyclicGraphs (tDAGs) A tripartite directedacyclicgraph is a graph G = (N,E) where: • the set of nodes N is partitioned in three sets Nt, Ng, and A • the set of edges is partitioned in four sets Nt, Ng, EA(t), and EA(g) where t = (Nt,Et) and g = (Nt,Et) are two trees EA(t) = {(x, y)|x  Nt and yA} EA(g) = {(x, y)|x  Ng and yA} VP S NP VP VB NP NP feed VB NP eat

  14. Tripartite DirectedAcyclicGraphs (tDAGs) Alternative definition A tDAGis a pair of extented trees G = (t,g) where: t = (NtAt,EtEA(t)) and g = (NgAg,EgEA(g)). VP S NP VP VB NP NP feed VB NP eat VP S NP VP VB NP NP X X Y feed VB NP Y eat

  15. Again challenges Computing the implicitkernelfunction K(P1,P2)=|S(P1)S(P2)| involvesgeneralgraphmatching. Thisisanexponentialproblem. Yet… tDAGs are particulargraphs and we can defineanefficientalgorithm Wewillanalyze the isomorphismamongtDAGs and wewill derive analgorithmfor

  16. IsomorphismbetweentDAGs Isomorphismbetweengraphs G1=(N1,E1) and G2=(N2,E2) are isomorphicif: • |N1|=|N2| and |E1|=|E2| • Amongall the bijecivefunctionsrelating N1 and N2, itexistsf : N1N2suchthat: • foreach n1 in N1, Label(n1)=Label(f(n1)) • foreach (na,nb) in E1, (f(na),f(nb)) is in E2

  17. IsomorphismbetweentDAGs IsomorphismadaptedtotDAGs G1 = (t1,g1) and G2 = (t2,g2) are isomorphic if these two properties hold • Partial isomorphism • g1 and g2 are isomorphic • t1 and t2 are isomorphic • This property generates two functions fg and ft • Constraint compatibility • fg and ft are compatible on the sets of nodes A1 and A2, if for each n A1, it happens that fg (n) = ft (n).

  18. IsomorphismbetweentDAGs • Partial isomorphism S , VP   Pa=(ta,ga)= NP VP 1 VB NP 1 NP 3 VB NP 3 S , VP   Pb=(tb,gb)= NP VP 1 VB NP 1 NP 2 VB NP 2 Cg= Ct= { ( , ) , ( , ) } { ( , ) , ( , ) } 1 1 3 2 1 1 3 2 Ct=Cg • Constraint compatibility

  19. Constraintcompatibility Ideasfor building the kernel Alternative constraints subsetsofS(P1)S(P2) PartialIsomorphism Wedefine K(P1,P2)=|S(P1)S(P2)| using the isomorphismbetweentDAGs The idea: reverse the orderofisomorphism detection • First, constraintcompatibility • Building a set Cofall the relevantalternative constraints • FindingsubsetsofS(P1)S(P2) meeting a constraintcC • Second, partialisomorphism detection

  20. Constraintcompatibility Ideasfor building the kernel Alternative constraints subsetsofS(P1)S(P2) PartialIsomorphism K(Pa,Pb)=|S(Pa)S(Pb)| I A 1 1 ,   Pa=(ta,ga)= M N B C 1 1 1 1 M M N N B B C C 2 1 2 1 1 2 1 2 A I 1 1 , Pb=(tb,gb)=   B C M N 1 1 1 1 B B C C M M N N 1 2 1 3 3 1 2 1 C={c1,c2}={ } , { ( , ) , ( , ) } { ( , ) , ( , ) } 1 1 2 2 1 1 2 3

  21. Constraintcompatibility Ideasfor building the kernel Alternative constraints subsetsofS(P1)S(P2) PartialIsomorphism K(Pa,Pb)=|S(Pa)S(Pb)| K(Pa,Pb)=|S(Pa)S(Pb)|=|(S(Pa)S(Pb))c1(S(Pa)S(Pb)) c2| I A 1 1 , C={c1,c2}   M N B C 1 1 Pa= 1 1 c1= { ( , ) , ( , ) } 1 1 2 2 M M N N B B C C 2 1 2 1 1 2 1 2 A I 1 1 ,   B C M N Pb= 1 1 1 1 B B C C M M N N 1 2 1 3 3 1 2 1 I A 1 1 A 1 I 1 ,    M N B C  { , 1 1 1 1 , , S(Pa)S(Pb)) c1= B C 1 1 M N 1 1 N N B B 2 1 1 2 I A 1 1 I A , , 1 1     M N , B C 1 } 1 1 1 M N B C 1 1 1 1 N N B B 2 1 1 2

  22. Constraintcompatibility Ideasfor building the kernel Alternative constraints subsetsofS(P1)S(P2) PartialIsomorphism K(Pa,Pb)=|S(Pa)S(Pb)|=|(S(Pa)S(Pb))c1(S(Pa)S(Pb)) c2| I A 1 1 ,   C={c1,c2} M N B C 1 1 Pa= 1 1 c2= { ( , ) , ( , ) } 1 1 2 3 M M N N B B C C 2 1 2 1 1 2 1 2 A I 1 1 , Pb=   B C M N 1 1 1 1 B B C C M M N N 1 2 1 3 3 1 2 1 I A 1 1 A 1 I 1 ,    M N B C  1 1 1 1 { , , , S(Pa)S(Pb)) c2= B C 1 1 M N 1 1 M M C C 2 1 1 2 I A 1 1 I A , , 1 1     M N B C 1 1 1 , 1 } M N B C 1 1 1 1 N N C C 2 1 1 2

  23. Constraintcompatibility Ideasfor building the kernel Alternative constraints subsetsofS(P1)S(P2) PartialIsomorphism K(Pa,Pb)=|cC(S(Pa)S(Pb))c|=|cC(S(ta)S(tb))c(S(ga)S(gb))c| I A 1 1 A 1 I ={ , 1 , , (S(Pa)S(Pb)) c1    M N B C  1 1 1 1 , B C 1 1 M N 1 1 N N B B 2 1 1 2 I A 1 1 , I A } = , , 1 1     M N B C 1 1 1 1 M N B C 1 1 1 1 N N B B 2 1 1 2 I 1 A 1 ={ }{ } = A I 1 1 M N 1 , 1 , B C 1 1 B C M N 1 1 1 1 N N 2 1 B B 1 2 =(S(ta)S(tb))c1 (S(ga)S(gb))c1

  24. Kernel on FOR featurespaces The generalEquation can becomputedusing: • KS (kernelfunctionfortrees) introduced in(Duffy&Collins, 2001) and refined in (Moschitti&Zanzotto, 2007) • The inclusionexclusionprinciple K(P1,P2)=|cC(S(t1)S(t2))c(S(g1)S(g2))c|

  25. ComputationalEfficencyAnalysis • ComparisonKernel (Zanzotto&Moschitti, Coling-ACL 2006),(Moschitti&Zanzotto, ICML 2007) • Test-bed: corpus • RecognizingTextualEntailment challenge data

  26. ComputationalEfficencyAnalysis Executiontime in seconds (s) forall the RTE2 withrespecttodifferentnumbersofallowedplaceholders

  27. AccuracyComparison • Training: RTE 1, 2, 3 • Testing: RTE 4

  28. Conclusions • Wereducedkernels in first-orderfeaturespacesasgraph-matchingproblems • Wedefined a newclassofgraphs, tDAGs • Wepresentedanefficientalgorithmforcomputingkernels in FOR featurespaces

  29. S S NP VP 1 VP NP NNS VB NP , 3 1   VB NP 1 NP NNS 3 Cows eat NN NNS feed Farmers 2 3 NN NNS NNS 2 3 1 1 animal extracts cows animal extracts 2 3 1 2 3 S S S { } , , VP     , ,... NP VP 1 NP VP NP VP 1 VB NP 1 NP 3 VB NP 3

  30. S S NP VP 1 VP NP NP NNS VB 2 , 1   VB NP 1 NP NNS 2 Cows eat NN 2 feed Mothers NN NNS 2 1 1 milk babies milk 2 1 2 S S S S { } , , VP     , ,... S S { } , , VP NP VP 1     , ,... NP VP 1 NP VP NP VP 1 VB NP 1 NP 2 NP VP NP VP 1 VB NP 1 NP 3 VB NP 2 VB NP 3

More Related