Thinking about Evaluation and Corpora for Plan Recognition
This paper discusses evaluation methodologies for plan recognition, focusing on both extrinsic and intrinsic evaluations. It highlights the importance of online prediction, precision/recall metrics, and the capability to predict uncertain outcomes. Various types of plan corpora are explored, including unlabeled, goal-labeled, and plan-labeled. The Monroe Plan Corpus is presented as a case study, showcasing its role in emergency planning with stochastic goal generation. Future directions aim to develop problem-solving labeled corpora and employ stochastic agents to enhance plan monitoring and replanning.
Thinking about Evaluation and Corpora for Plan Recognition
E N D
Presentation Transcript
Thinking about Evaluation and Corpora for Plan Recognition Nate Blaylock Florida Institute for Human and Machine Cognition (IHMC) Ocala, Florida blaylock@ihmc.us
Plan Recognition Evaluation • Extrinsic (Tom Dietterich’s comment) • Online • prediction after each observation • Precision/recall • ability to predict “don’t know” • Offline • predict right answer for the session • convergence
More Evaluation • How early in session do we get it? • convergence point (Lesh – “work saved”) • Partial results (often enough) • Lower subgoals in HTN plan • More abstract (subsuming) goals • Schema only or only some parameters • N-best prediction
Example: Results on Monroe 1 Best 2 Best
Plan Corpora: Types • Unlabeled: sequence of actions taken, e.g., • Unix commands (Davidson and Hirsh’98) • also GPS data (e.g., Patterson et al. 2003) • Goal-labeled: actions + top-level goal(s), e.g., • MUD domain (Albrecht et al. ’98) • Unix/Linux (Lesh ’98, Blaylock and Allen 2004) • Linux Plan Corpus available online
Plan Corpora: Types (2) • Plan-labeled: actions + hierarchical plan • Monroe Plan Corpus (Blaylock and Allen 2005) • available online • (future?) Problem-solving labeled • Action failure, replanning, goal abandonment, ...
Creating Plan Corpora (from humans) Human annotation of everything, OR • Action sequence: record observations directly • Top-level goal(s): • idea 1: environment where goal achievement observable (e.g., MUD) • idea 2: controlled environment where goal is known a priori (e.g., Unix/Linux) • Plan-labeled: • annotate with existing plan recognizer (Bauer ’96) • May not apply to all domains
Generating Artificial Corpora(Blaylock and Allen, 2005) • Randomized AI planner (SHOP2) • Model domain for planner (HTN) • For each desired plan session • stochastically generate goal(s) • stochastically generate start state • find plan using planner
Using the Method: The Monroe Corpus • Emergency planning domain • 10 top-level goal schemas • 46 methods (recipes) • 30 operators (subgoals/actions) • Average depth to action: 3.8 • 5000 plan sessions generated in less than 10 minutes – plan-labeled corpus • Download at • http://cs.rochester.edu/research/speech/monroe-plan/
Future Directions • Problem-solving labeled corpus • Similar method to Monroe • Build stochastic agent to do problem solving in domain with plan monitoring, replanning, goal abandonment, etc. • Label steps where PS behavior happened • cf. (Rosario, Oliver, and Pentland, 1999)