Thinking about Evaluation and Corpora for Plan Recognition

Thinking about Evaluation and Corpora for Plan Recognition Nate Blaylock Florida Institute for Human and Machine Cognition (IHMC) Ocala, Florida blaylock@ihmc.us

Plan Recognition Evaluation • Extrinsic (Tom Dietterich’s comment) • Online • prediction after each observation • Precision/recall • ability to predict “don’t know” • Offline • predict right answer for the session • convergence

More Evaluation • How early in session do we get it? • convergence point (Lesh – “work saved”) • Partial results (often enough) • Lower subgoals in HTN plan • More abstract (subsuming) goals • Schema only or only some parameters • N-best prediction

Example: Results on Monroe 1 Best 2 Best

Plan Corpora: Types • Unlabeled: sequence of actions taken, e.g., • Unix commands (Davidson and Hirsh’98) • also GPS data (e.g., Patterson et al. 2003) • Goal-labeled: actions + top-level goal(s), e.g., • MUD domain (Albrecht et al. ’98) • Unix/Linux (Lesh ’98, Blaylock and Allen 2004) • Linux Plan Corpus available online

Plan Corpora: Types (2) • Plan-labeled: actions + hierarchical plan • Monroe Plan Corpus (Blaylock and Allen 2005) • available online • (future?) Problem-solving labeled • Action failure, replanning, goal abandonment, ...

Creating Plan Corpora (from humans) Human annotation of everything, OR • Action sequence: record observations directly • Top-level goal(s): • idea 1: environment where goal achievement observable (e.g., MUD) • idea 2: controlled environment where goal is known a priori (e.g., Unix/Linux) • Plan-labeled: • annotate with existing plan recognizer (Bauer ’96) • May not apply to all domains

Generating Artificial Corpora(Blaylock and Allen, 2005) • Randomized AI planner (SHOP2) • Model domain for planner (HTN) • For each desired plan session • stochastically generate goal(s) • stochastically generate start state • find plan using planner

Using the Method: The Monroe Corpus • Emergency planning domain • 10 top-level goal schemas • 46 methods (recipes) • 30 operators (subgoals/actions) • Average depth to action: 3.8 • 5000 plan sessions generated in less than 10 minutes – plan-labeled corpus • Download at • http://cs.rochester.edu/research/speech/monroe-plan/

Future Directions • Problem-solving labeled corpus • Similar method to Monroe • Build stochastic agent to do problem solving in domain with plan monitoring, replanning, goal abandonment, etc. • Label steps where PS behavior happened • cf. (Rosario, Oliver, and Pentland, 1999)

Thinking about Evaluation and Corpora for Plan Recognition