1 / 19

Tbilisi, Georgia, September, 2007

Introduction to: Automated Essay Scoring (AES) Anat Ben-Simon National Institute for Testing & Evaluation. Tbilisi, Georgia, September, 2007. Merits of AES. Psychometric Objectivity & standardization Logistic Saves time & money Allows for immediate reporting of scores Didactic

carver
Télécharger la présentation

Tbilisi, Georgia, September, 2007

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to:Automated Essay Scoring (AES)Anat Ben-Simon National Institute for Testing & Evaluation Tbilisi, Georgia, September, 2007

  2. Merits of AES • Psychometric • Objectivity & standardization • Logistic • Saves time & money • Allows for immediate reporting of scores • Didactic • Immediate diagnostic feedback

  3. AES - How does it work? • Humans rate sample of essays • Computer extracts relevant text features • Computer generates model to predict human scores • Computer applies prediction model to score new essays

  4. AES – Model Determination Feature determination • Text driven– empirically based quantitative (computational) variables • Theoretically driven Weight determination • Empirically based • Theoretically based

  5. Scoring Dimensions

  6. Surface variables Essay length Av. word / sentence length Variability of sentence length Av. word frequency Word similarity to prototype essays Style errors (e.g.,repetitious words, very long sentences) NLP based variables The number of “discourse” elements Word complexity (e.g., ratio of different content words to total no. of words) Style errors (e.g.,passive sentences) AES - Examples of Text Features

  7. AES: Commercially Available Systems • Project Essay Grade (PEG) • Intelligent Essay Assessor (IEA) • Intellimetric • e-rater

  8. PEG (Project Essay Grade) Scoring Method • Uses NLP tools (grammar checkers, part-of-speech taggers) as well as surface variables • Typical scoring model uses 30-40 features • Features are combined to produce a scoring model through multiple regression Score Dimensions • Content, Organization, Style, Mechanics, Creativity

  9. Intelligent Essay Assessor Scoring Method • Focuses primarily on the evaluation of content • Based on Latent Semantic Analysis (LSA) • Based on a well-articulated theory of knowledge acquisition and representation • Features combined through hierarchical multiple regression Score Dimensions • Content, Style, Mechanics

  10. Intellimetric Scoring Method • “Brain-based” or “mind-based” model of information processing and understanding • Appears to draw more on artificial intelligence, neural net, and computational linguistic traditions than on theoretical models of writing • Uses close to 500 features Score Dimensions • Content, Creativity, Style, Mechanics, Organization

  11. E-rater v2 Scoring Method • Based on natural language processing and statistical methods • Uses a fixed set of 12 features that reflect good writing • Features are combined using hierarchical multiple regression Score Dimensions • Grammar, usage, mechanics, and style • Organization and development • Topical analysis (content) • Word complexity • Essay length

  12. Writing Dimensions and Features in e-rater v2 (2004)

  13. Reliability Studies Studies comparing inter-rater agreement to computer-rater agreement

  14. AES: Validity Issues • To what extent are the text features used by AES programs valid measures of writing skills? • To what extent is the AES inappropriately sensitive to irrelevant features and insensitive to relevant ones? • Are human grades an optimal criterion? • Which external criteria should be used for validation? • What are the wash-back effects (consequential validity)?

  15. Weighting Human & computer Scores • Automated scoring used only as a quality control (QC) check • Automated scoring and human scoring • Human scoring used only as a QC check

  16. AES: To use or not to use? • Are the essays written by hand or composed on computer? • Is there enough volume to make AES cost-effective? • Will students, teachers, and other key constituencies accept automated scoring?

  17. Criticism and Reservations • Insensitive to some important features relevant to good writing • Fail to identify and appreciate unique writing styles and creativity • Susceptible to construct-irrelevant variance • May encourage writing for the computer as opposed to writing for people

  18. How to choose a program? • Does the system work in a way you can defend? • Is there a credible research base supporting the use of the system for your particular purpose? • What are the practical implications of using the system? • How will the use of the system affect students, teachers, and other key constituencies?

  19. Thank You

More Related