1 / 8

Preventing Sexual Unprotected Intercourse

Preventing Sexual Unprotected Intercourse. Prenominal Adjective Ordering Ben Newman, Chris Collette. Motivation. Leather Old Green Chair Ninja Mutant Teenage Turtles Moral Irish High Standards Sleeping Green Bag Green Sleeping Bag …and Sexual Unprotected Intercourse. Outline. Context

tsoto
Télécharger la présentation

Preventing Sexual Unprotected Intercourse

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Preventing Sexual Unprotected Intercourse Prenominal Adjective Ordering Ben Newman, Chris Collette

  2. Motivation • Leather Old Green Chair • Ninja Mutant Teenage Turtles • Moral Irish High Standards • Sleeping Green Bag • Green Sleeping Bag • …and Sexual Unprotected Intercourse

  3. Outline • Context • Method • Considerations • Memory-based Learning • Features • Results

  4. Context • Prenominal Adjective Ordering • Statistics based on establishment of semantic classes • Building off of work by Robert Malouf (2000) • Ordering on a bigram level • Sparsity • Simplicity • Generally established approximation: • Size/length/shape < old/new/young < color < nationality < style < gerund < denominal • A < B means a class A adjective should precede a class B adjective

  5. Method • Considerations • Capitalization • Turned into a feature • Non-Alphabetic Characters (&eacute; for é) • Left them in as extra information • Artificial Frequency of Rare Sequences • e.g. <Nationality> <adjective> in specific articles • Removed matching adjacent adjective sequences • Multi-word adjectives • Used POS tags as delimiters

  6. Method • Corpus: British National Corpus • 100 million words • 415,731 Adj Adj sequences • 404,686 sequences after adjacent duplicate removal • Memory-based Learning • Tilburg Memory-Based Learner • Order adjective Bigrams based on array of features • Everything is either ordered correctly or not • No precision versus recall

  7. Method • Features • Morphological • Last 8 characters of each Adj. as 16 individual features • First letter capitalization as well • Nationality and short word extra information • Improved test set accuracy by 0.14% • Brute Force • Lists of words for semantic classes • Lowered Accuracy • Positional Probabilities • Probability that a word is first in any pair given corpus • Combination

  8. Results • Accuracies: • Morphological: 89.47% • Positional Probabilities: 89.02% • Combined: 90.17% • Analysis • Accurate • Exact effects of individual features and considerations difficult to extract • Less than Malouf’s 91.85% • Likely due to data cleaning (adjacent sequence removal) • Data sparsity continual problem

More Related