1 / 14

Readability Assessment for Text Simplification

Readability Assessment for Text Simplification. Sandra Aluisio 1 , Lucia Specia 2 , Caroline Gasperin 1 , Carolina Scarton 1 1 University of São Paulo, Brazil 2 University of Wolverhampton , UK. Motivation. INAF levels. 68%.

thyra
Télécharger la présentation

Readability Assessment for Text Simplification

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Readability Assessment for Text Simplification Sandra Aluisio1, Lucia Specia2, Caroline Gasperin1, Carolina Scarton1 1University of São Paulo, Brazil 2University of Wolverhampton, UK The 5th Workshop on Innovative Use of NLP for Building Educational Applications

  2. Motivation INAF levels 68% • Rudimentary: studied up to 4 years; can find explicit information in short and familiar texts • Basic: studied between 4 and 8 years; can read and understand texts of average length, and find information even when it is necessary to make some inference Develop technology to benefit low literacy readers The 5th Workshop on Innovative Use of NLP for Building Educational Applications

  3. Readability Assessment  RUDIMENTARY  BASIC • To assess the readability level of a text • Three levels of readability: INAF levels Rudimentary – Basic – Advanced • To supplement our text simplification technology • Two levels of simplification: degree of application of simplification operations • STRONG: operations are applied to all complex syntactic phenomena present • NATURAL: operations are applied selectively, only when the resulting text remains “natural” The 5th Workshop on Innovative Use of NLP for Building Educational Applications

  4. Text Simplification Scenario SIMPLIFICA • Authoring tool for creating simplified texts • Author inputs text • Author receives suggestions of possible simplifications: may accept or not • Lexical substitutions • Syntactic simplification • Author does not know if the text is simple enough for his audience • Feedback: Readability assessment The 5th Workshop on Innovative Use of NLP for Building Educational Applications

  5. Readability Assessment System • Machine learning • Classes = 3 INAF levels • Trained on corpus of manually simplified texts • Original text + natural and strong simplifications • Extensive set of features • Cognitively-motivated: Coh-Metrix[Graesser et al., 2004] • Syntactic: occurrence of complex phenomena • Language model: up to trigrams • 3 paradigms: Classification, Ordinal Classification, Regression [Heilman et al., 2007] The 5th Workshop on Innovative Use of NLP for Building Educational Applications

  6. Corpora • Training and testing corp • General news: Zero Hora (ZH) newspaper • Popular science news: CadernoCiencia (CC) • 3 versions for each text: original, natural, strong The 5th Workshop on Innovative Use of NLP for Building Educational Applications

  7. Features The 5th Workshop on Innovative Use of NLP for Building Educational Applications

  8. Feature Analysis Pearson correlation between features and literacy levels The 5th Workshop on Innovative Use of NLP for Building Educational Applications

  9. Predicting readability Levels Classification Weka SVM Ordinal Classification WekaPairwise SVM The 5th Workshop on Innovative Use of NLP for Building Educational Applications

  10. Predicting readability Levels Regression Weka SVM-reg, RBF Kernel The 5th Workshop on Innovative Use of NLP for Building Educational Applications Best correlation: Regression Lowest MAE: Ordinal Classification Combination of all features consistently yields better results: more robust Syntactic features achieve the best correlation scores Language model features performed the poorest

  11. Conclusions • It is possible to predict with satisfactory performance the readability level of texts according to our three classes of interest • Ordinal Classification seems to be the most appropriate model to use • High correlation, lowest error rate (MAE) • Combination of all features is best The 5th Workshop on Innovative Use of NLP for Building Educational Applications

  12. SIMPLIFICA Tool • Integration of classification model • Simplest model, highest F-measure, comparable correlation scores The 5th Workshop on Innovative Use of NLP for Building Educational Applications

  13. Future Work Add deeper cognitive features, e.g. semantic, coreference, latent semantics metrics User evaluation: authors The 5th Workshop on Innovative Use of NLP for Building Educational Applications

  14. Thanks! PorSimples project http://caravelas.icmc.usp.br/ The 5th Workshop on Innovative Use of NLP for Building Educational Applications

More Related