80 likes | 96 Vues
Preventing Sexual Unprotected Intercourse. Prenominal Adjective Ordering Ben Newman, Chris Collette. Motivation. Leather Old Green Chair Ninja Mutant Teenage Turtles Moral Irish High Standards Sleeping Green Bag Green Sleeping Bag …and Sexual Unprotected Intercourse. Outline. Context
E N D
Preventing Sexual Unprotected Intercourse Prenominal Adjective Ordering Ben Newman, Chris Collette
Motivation • Leather Old Green Chair • Ninja Mutant Teenage Turtles • Moral Irish High Standards • Sleeping Green Bag • Green Sleeping Bag • …and Sexual Unprotected Intercourse
Outline • Context • Method • Considerations • Memory-based Learning • Features • Results
Context • Prenominal Adjective Ordering • Statistics based on establishment of semantic classes • Building off of work by Robert Malouf (2000) • Ordering on a bigram level • Sparsity • Simplicity • Generally established approximation: • Size/length/shape < old/new/young < color < nationality < style < gerund < denominal • A < B means a class A adjective should precede a class B adjective
Method • Considerations • Capitalization • Turned into a feature • Non-Alphabetic Characters (é for é) • Left them in as extra information • Artificial Frequency of Rare Sequences • e.g. <Nationality> <adjective> in specific articles • Removed matching adjacent adjective sequences • Multi-word adjectives • Used POS tags as delimiters
Method • Corpus: British National Corpus • 100 million words • 415,731 Adj Adj sequences • 404,686 sequences after adjacent duplicate removal • Memory-based Learning • Tilburg Memory-Based Learner • Order adjective Bigrams based on array of features • Everything is either ordered correctly or not • No precision versus recall
Method • Features • Morphological • Last 8 characters of each Adj. as 16 individual features • First letter capitalization as well • Nationality and short word extra information • Improved test set accuracy by 0.14% • Brute Force • Lists of words for semantic classes • Lowered Accuracy • Positional Probabilities • Probability that a word is first in any pair given corpus • Combination
Results • Accuracies: • Morphological: 89.47% • Positional Probabilities: 89.02% • Combined: 90.17% • Analysis • Accurate • Exact effects of individual features and considerations difficult to extract • Less than Malouf’s 91.85% • Likely due to data cleaning (adjacent sequence removal) • Data sparsity continual problem