1 / 20

Automatic Syllabus Classification JCDL – Vancouver – 22 June 2007

Automatic Syllabus Classification JCDL – Vancouver – 22 June 2007. Edward A. Fox (presenting co-author), Xiaoyan Yu, Manas Tungare, Weiguo Fan, Manuel Perez-Quinones, William Cameron, GuoFang Teng, and Lillian (“Boots”) Cassel. Why Study the Syllabus Genre?. Educational resource

karif
Télécharger la présentation

Automatic Syllabus Classification JCDL – Vancouver – 22 June 2007

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Automatic Syllabus ClassificationJCDL – Vancouver – 22 June 2007 Edward A. Fox (presenting co-author), Xiaoyan Yu, Manas Tungare, Weiguo Fan, Manuel Perez-Quinones, William Cameron, GuoFang Teng, and Lillian (“Boots”) Cassel

  2. Why Study the Syllabus Genre? • Educational resource • Importance to the educational community • Educators • Students • Self-learners • Thanks to NSF DUE grant 5328255 (personalization support for NSDL)

  3. Where to look for a specific syllabus? • Non-standard publishing mechanisms: • Instructor’s website • CMSs (courseware management systems, e.g., Sakai) • Catalogs • Limited access outside the university • Search on the Web • Many non-relevant links in search results

  4. Syllabus Library • Bootstrapping • Identify true syllabi from search results • Store in a repository • Develop tools & applications • Scaling up • Encourage contributions from educational communities

  5. An Essential Step towards Syllabus Library: Classification • Classification Objects: • Potential syllabi in Computer Science: search on the Web, using syllabus keywords, only in the educational domains • Class Definition • Feature Selection • Model Selection • Training and Testing

  6. Four Classes Noise

  7. Full Syllabus

  8. Partial Syllabus

  9. Entry Page

  10. Noise

  11. course code title class time& location offering institution teaching staff course description objectives web site prerequisite textbook grading policy schedule assignment exam and resources Syllabus Components

  12. Features • 84 Genre-specific Features • the occurrences of keywords • the positions of keywords, and • the co-occurrences of keywords and links • A series of keywords for each syllabus component

  13. Classification Models • Discriminative Models • Support Vector Machines (SVM) • SMO-L: Sequential Minimal Optimization, accelerating the training process of SVM • SMO-P: SMO with a polynomial kernel • Generative Models • Naïve Bayes (NB) • NB-K: Applying kernel methods to estimate the distribution of numeric attributes in NB modeling

  14. Evaluation • Training corpus: 1020 out of the 8000+ potential syllabi • All in HTML, PDF, PostScript, or Text • Manual tagging on the training corpus • Unanimous agreement by three co-authors • Evaluation strategy: ten-fold cross validation • Metrics: F1 (an overall measure of classification performance)

  15. Results w. random set Best items are in purple boxes. Acctr: Classification accuracy on the training set.

  16. Results (Cont’d) • SVM outperforms NB regarding our syllabus classification on average. • All classifiers fail in identifying the partial syllabus class. • The kernel settings for NB are not helpful in the syllabus classification task. • Classification accuracy on training data is not that good.

  17. Future Work • Feature selection • Add general feature selection methods on text classification • e.g., Document Frequency, Information Gain, and Mutual Information • Hybrid: combine our genre-specific features with the general features

  18. Future Work (Cont’d) • Syllabus Library • Welcome to http://doc.cs.vt.edu • Share your favorite course resources – not limited to the syllabus genre. • Information Extraction • Semantic search • Personalization

  19. Summary • Towards a syllabus library • Starting from search results on the web • Classification of the search results for true syllabi • SVM is a better choice for our syllabus classification task. • Towards an educational on-line community around the syllabus library

  20. Q & A

More Related