50 likes | 149 Vues
Architecture and Evaluation Issues. Julie Fitzgerald Arlington, Virginia August 9, 2003. Areas of Research for Future (KBS) Evaluations. Evaluation Mechanisms Role of Participants Data collection Time management Meaningfulness of results. Open Questions for Future (KBS) Evaluations.
 
                
                E N D
Architecture and Evaluation Issues Julie Fitzgerald Arlington, Virginia August 9, 2003
Areas of Research for Future (KBS) Evaluations • Evaluation Mechanisms • Role of Participants • Data collection • Time management • Meaningfulness of results
Open Questions for Future (KBS) Evaluations • Evaluation Mechanisms • IET has used challenge problems and specifications to present evaluation mechanisms, including information related to the questions to be used and the grading format. • CPs are time consuming to develop • Future research: Develop a methodology for evaluating large KB systems. • Role of Participants • In HPKB, the systems were tested using KEs. RKF was SME focused. • The user effects the data we need to collect and the analysis performed on that data. • Future research: • User profiling • Interaction profiling (user to system; user to technology developer; user to outside resource)
Open Questions for Future (KBS) Evaluations • Data collection • Labor intensive and invasive • Future research: • Automated data collection and processing • At a minimum, better specification of needed data • Time management • Evaluations are time consuming • True for development, execution, and analysis • More formalized evaluation methodology should help • Future research: • More painless evaluations (on-going evaluations, evaluations in the background, automatic evaluations, other ideas?)
Open Questions for Future (KBS) Evaluations • Meaningfulness of Results • Methods and Metrics need to be better defined • This is context dependent—we won’t always be testing the same things • Methodology development is required, esp. variable isolation and controls • Characterization of users need to improve • Test users on related tasks • Track user-system interaction more closely • Requires better task decompositions • Need to relate results back to both system and user performance • Scope of evaluations needs to widen • Need more data (more users, longer durations)