1 / 14

Authorship Attribution

Authorship Attribution. By Allison Pollard. What is Authorship Attribution?. The way of determining who wrote a text when it is unclear who wrote it. It is useful when two or more people claim to have written something or when no one is willing (or able) to stay that (s)he wrote the piece.

Olivia
Télécharger la présentation

Authorship Attribution

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Authorship Attribution By Allison Pollard

  2. What is Authorship Attribution? • The way of determining who wrote a text when it is unclear who wrote it. • It is useful when two or more people claim to have written something or when no one is willing (or able) to stay that (s)he wrote the piece

  3. The Basis • A text makes use of all linguistic domains: semantics, syntax, lexicography, phonology (orthography) and morphology. Each of these domains is rule governed, yet, within these rules and among the components, the grammar offers the writer choices. • The text as an end product is an outcome of the particular choices taken by its author. This is why each specific text carries the fingerprints of its creator.

  4. The Assumptions: • there is a specific single author • there are choices to be made • the author is consistent in his/her preferred choices • these choices are present and could be detected in all end products of that creator

  5. Computerized Analysis • Developed in the 1980s • Based on stylometry—the statistical analysis of literary style [quantifying some of the features of an author’s style]

  6. Method 1:Word- or Sentence- Length • The origin of stylometry • First developed in 1887, later extended in 1938 • NOT reliable methods

  7. Method 2:Function Words • Relies on word usage and context-free (“function”) words • Analyze frequency, position, or immediate context of words • Criticized method, cannot reliably distinguish between certain literature types

  8. Method 3:Vocabulary Distributions • Measuring the “richness” or “diversity” of an author’s vocabulary • Analyzes the frequency profile of word-usage to glimpse the author’s extent of vocabulary

  9. Method 4:Content Analysis • Tabulates the frequency of types of words in a text • Aims to reach the denotative or connotative meaning of the text

  10. Method 5:Neural Networks • Recognize the underlying organization of data (which is vitally important for any pattern recognition problem, which Stylometry is)

  11. Past Uses—Scholarly • Did Shakespeare write his own plays? • Who wrote the Federalist papers?

  12. Recent Uses—Literary • Determine who wrote the anonymously published novel Primary Colors [Joe Klein] • Target suspects for the authorship of the Unabomber’s Manifesto [Ted Kaczynski]

  13. Future Uses—Beyond • Identifying and blocking spam • Detecting lies, flag potential inconsistencies • Locate authors of malicious code

  14. References • Ephratt, Michal. Authorship attribution - the case of lexical innovations. http://www.cs.queensu.ca/achallc97/papers/p006.html • Gerritsen, Corey M. Authorship Attribution Using Lexical Attraction. http://genesis.csail.mit.edu/papers/Gerritsen2003.pdf • Holmes, David I. Stylometry: Its Origins, Development and Aspirations. http://www.cs.queensu.ca/achallc97/papers/s004.html • Pfleeger, Charles P. and Shari Lawrence Pfleeger. Security in Computing. Pg 342.

More Related