Semantic Parsing

Semantic Parsing Via imitation learning

Outline • Goal - Question Answering • Parsing via Lambda DCS and Semantic Functions • The Search Problem • Fixed Order Parsing • Imitation learning of agenda-based parsers

Goal - Question Answering • Given a question in natural language, return the right answer. • Question: “What city was Abraham Lincoln born in?” • Answer: “Hodgenville” • We are given as inputs: • A) Grammar • B) Knowledge Base • C) Training Set

Lambda DCS and Semantic Functions ROOT: Type.CityPlaceOfBirth.AbeLincoln Rules: ENTITY [LEX] BINARY[LEX] SETENTITYBINARY[JOIN] SETSETSET[INTERSECTION] INTERSECTION SET: PlaceOfBirth.AbeLincoln JOIN SET: Type.City ENTITY: AbeLincoln BINARY: PlaceOfBirth LEX LEX LEX What city abraham lincoln born in ?

We will use the framework of simple to express our logical forms. • Unary predicates • e.g: The unary predicate Cities can be defined as: • Type.City = {New York, San Francissco, Tveria, … } • Binary Predicates: • e.g: the binary predicate Couples can be defined as: • Couples = {<Dana, Yossi>, <Amnon, Shoshi>, …} We will mostly use to express: • Entity: an unary predicate which is a singleton. E.g: AbeLincoln={Abraham Lincoln} • Set: unary predicate. E.g: Type.City

We can also define a set of operators such as: Join and Intersection • Join– Given A property (e.gPlaceOfBirthOf) and an Entity (e.gAbeLincoln) we can perform a join operation. • The results will be: PlaceOfBirth.AbeLincoln. • The lambda calculus equivalent:

We can also define a set of operators such as: Join and Intersection • Intersection – intersection of sets. • E.g: • The results will be the city of birth of Abraham Lincoln. • This is equivalent to:

Inputs - Grammar • - terminals (words). • E.g: ={“place”, “birth”,..}, • - Categories. E.g: • = {BINARY, ENTITY, SET, ROOT} • - a set of rules. E.g: • SENTITYBINARY[JOIN] • ENTITY[LEX]

Inputs – cont. • Training Set: • e.g: {(“What city was Abraham Lincoln born in?”, “Hodgenville”), …} • Knowledge Base: • e.gFreeBase. We will use the resulting logical form to query it.

Outline • Goal - Question Answering • Parsing via Lambda DCS and Semantic Functions • The Search Problem • Fixed Order Parsing • Paper approach –Imitation learning of agenda-based parsers

The Search Problem LEX(city)={SET:Type.City, SET: Type.Loc, …}, |LEX(city)|=362 LEX(lincoln)={ENTITY:AbeLincoln, ENTITY: USSLincoln, ENTITY: LincolnTown, …} |LEX(lincoln)| = 20 |LEX(abraham)| = 20 |LEX(in)| = 508 |LEX(born)| = 391 What city abraham lincoln born in ?

The Search Problem Even for a simple question like “What city Abraham Lincoln was born in” we see that the number of root derivations, e.g the number of successfully constructed parsing trees might be > 1M. Can we efficiently search within the space of ROOT derivations the best parsing tree?

Tackling the Search Problem • We will present approaches to tackle the search problem: • Fixed-Order parsing using beam search • Imitation Learning of Agenda-Based parser

Fixed Order Parsing – High level For each sample : • Find ROOT derivation candidates using Beam Search • Update the weights of the model according to the label

Parsing using Beam Search For Beam in size 2 S(SET:Type.Loc) = 3, S(SET:Type.City)=5 S(SET:Type.CitiesInCalifornia)=1 Rules: ENTITY [LEX] SETENTITYBINARY[JOIN] SETSETSET[INTERSECTION] Given a Score Function S SET: Type.City SET:Type.CitiesInCalifornia SET:Type.Loc What city abraham lincoln born in ?

Parsing using Beam Search For Beam in size 2 Rules: ENTITY [LEX] SETENTITYBINARY[JOIN] SETSETSET[INTERSECTION] Given a Score Function S SET: Type.City SET: PlacesAbrahamVisited SET:Type.Loc What city abraham lincoln born in ?

Parsing using Beam Search For Beam in size 2 S(ENTITY:AbeLincoln) = 3, S(ENTITY:LincolnTown)=5 S(ENTITY:USSLincoln)=1 Rules: ENTITY [LEX] SETENTITYBINARY[JOIN] SETSETSET[INTERSECTION] Given a Score Function S ENTITY: AbeLincoln ENTITY: LincolnTown ENTITY: USSLincoln What city abraham lincoln born in ?

Parsing using Beam Search For Beam in size 2 Rules: ENTITY [LEX] SETENTITYBINARY[JOIN] SETSETSET[INTERSECTION] Given a Score Function S SET: PlaceAbrahamVisited What city abraham lincoln born in ?

Parsing using Beam Search For Beam in size 2 Entity: T SET: T What city abraham lincoln born in ?

Fixed Order Parsing - Highlevel For each sample : • Find ROOT derivation candidates using Beam Search • Update the weights of the model according to the label

Fixed Order Parsing Model: Denote We want to learn the linear scoring function - features vector, extracted in some deterministic way - vector of weights we want to learn

Fixed Order Parsing Training: we train the model online. For Denote Where

Fixed Order Parsing Training: we train the model online. For Denote And maximize the following objective: Problem: we only have our beam search result Define:

Fixed Order Parsing Training: we train the model online. For We want to maximize the following objective: We can set to be a continuous in [0,1] = 1 = 0

Fixed Order Parsing – High level For each sample : • Find ROOT derivation candidates using Beam Search • Update the weights of the model according to the label

Agenda Based Parsing Assuming we have an agenda, which prioritize all our possible actions in the derivations process, we will not need to calculate every cell in the Table H.

Agenda Based Parsing - Initialization ENTITY: AbeLincoln ENTITY: LincolnHigh ENTITY: USSLincoln … BINARY: PlaceOfBirth ENTITY: YearOfBirth ENTITY: BirthOfANation … ENTITY: AbeLincoln ENTITY: Abraham … SET: Type.City SET: Type.Loc ENTITY: CityOfBoston …. Q = {SET: Type.City, …, ENTITY: AbeLincoln, …, ENTITY: LincolnHigh… , BINARY: PlaceOfBirth, …} LEX LEX LEX LEX What city abraham lincoln born in ?

Agenda Based Parsing – Cont. SET: PlaceOfBirth.AbeLincolnType.City Q = { SET: Type.City, …, ENTITY: AbeLincoln, …, BINARY: PlaceOfBirth, … , …} Q = { SET: Type.City, …, ENTITY: AbeLincoln, …, BINARY: PlaceOfBirth, … , SET: PlaceOfBirth.AbeLincoln …} Q = { SET: Type.City, …, ENTITY: AbeLincoln, …, BINARY: PlaceOfBirth, … , SET: PlaceOfBirth.AbeLincoln …} Q = { SET: Type.City, …, ENTITY: AbeLincoln, …, BINARY: PlaceOfBirth, … , …} Q = { SET: Type.City, …, ENTITY: AbeLincoln, …, BINARY: PlaceOfBirth, … , SET: PlaceOfBirth.AbeLincoln SET: PlaceOfBirth.AbeLincolnType.City … } Q = { SET: Type.City, …, ENTITY: AbeLincoln, …, BINARY: PlaceOfBirth, … , SET: PlaceOfBirth.AbeLincoln, SET: PlaceOfBirth.AbeLincolnType.City, …. … } SET: PlaceOfBirth.AbeLincoln INTERSECTION JOIN BINARY: PlaceOfBirth SET: Type.City ENTITY: AbeLincoln LEX LEX LEX What city abraham lincoln born in ?

Algorithm Once finished, we can just As the parsing result. But how do we learn an Agenda?

Disadvantages of Fixed Order Parsing A disadvantage is that in order to get K root derivations, we have to calculate derivation for all spans and categories! In addition we learn on root derivations: And apply it on partial derivations!

Reinforcement Learning State: History: Reward: )

Reinforcement Learning • Our policy is a distribution of the available actions: • Thenin state we choose action according to our policy with 50% chance. 1/4 1/2 1/4

Reinforcement Learning • Some definitions: • A history is simple a sequence of states and actions taken: • We define a distribution over histories:

Reinforcement Learning • Some definitions: • A history is simple a sequence of states and actions taken: • We define a distribution over histories: • And maximize our objective:

Reinforcement Learning • We define a distribution over histories: • And maximize our objective: The number of histories is exponentially big, Leading to slow convergence.. -> use online learning

Imitation Learning • And maximize our objective: • If we find a “good” target history, we can perform the following onlineupdate: • - learning rate • a reward.

Imitation Learning – finding • We will use our agenda based parsing algorithm to generate K root derivations using the policy . • We denote: • - the root derivation with the highest reward out of . • We will use to generate .

Imitation Learning – local reweighting • - the root derivation with the highest score out of . • – indicate whether action a is a sub derivation of . • , for some . • And the probability of an history:

Imitation Learning – history compression

Imitation Learning – history compression • - the root derivation with the highest score out of . • For every history we define as the sequence of indices such that: • = 1 for every i. • The compressed history is:

Imitation Learning - combining • - the root derivation with the highest score out of . • The reweighted distribution: • We will use our agenda based parsing algorithm to generate K root derivations using the policy. • And return such that c(h’) maximizes the reward.

Imitation Learning - combining

Imitation Learning - Results

Semantic Parsing