80 likes | 89 Vues
Authorship Attribution. Erik Goldman & Abel Allison. Problem. Definition: Identification of the author of an anonymously written document given a set of candidate authors. Applications : Historical Scholarship Investigative Forensic Identification Example: Fake Steve Jobs. Related Work.
E N D
Authorship Attribution Erik Goldman & Abel Allison
Problem Definition: Identification of the author of an anonymously written document given a set of candidate authors. Applications: • Historical Scholarship • Investigative Forensic Identification • Example: Fake Steve Jobs
Related Work • Support Vector Machine methods [Diederich et al. (2003)] • Document prototypes (interesting documents or part of extracted, salient texts, to match with a document database [Visa et al. (2001)] • Numerical method of fractional counts [Burrel and Rousseau (1995)]
Approach • For each work in the training set, count various feature data (more on features next slide), store as histograms. • Input unknown document and make same counts. • Compare the histograms of each author with those of the unknown. Each feature contributes a weighted vote. • Choose author with the highest comparison score
Metrics • Limit Word Frequency-Words frequently used by the author across multiple works. • Grapheme Frequency-Counts of alphanumeric and symbol characters. • Part-of-speech Bigram Frequency - • Preterminal Tag Bigram Model -
Histogram Comparisons • Two Methods Used • Chi-Squared Metric • Difference Formula – similar to the Chi-Squared formula, except accounts for sparsity of bi-gram counts by normalizing them with respect to the average counts:
Tests • Used the power set of our set of authors. • For each element in the power set, we ran our tests using each of the authors as the unknown and recorded the results.