110 likes | 261 Vues
A Labeled LDA Approach to the Dynamics of Collaboration. Nikhil Johri CS 224N. Motivating Questions. What is the value added from academic collaboration? Division of labor? Mixture of individual contributions? New, synergistic ideas? Can we identify different collaboration styles?
E N D
A Labeled LDA Approach to the Dynamics of Collaboration Nikhil Johri CS 224N
Motivating Questions • What is the value added from academic collaboration? • Division of labor? • Mixture of individual contributions? • New, synergistic ideas? • Can we identify different collaboration styles? • Synergy between established authors • Ideas from newer vs. older authors • Advisor + apprentices • What are the characteristics of influential collaborations and collaborators?
Dataset • ACL (Association of Computational Linguistics) Corpus • 16,000+ papers • Ranges from 1965 to 2009 • Collaborations • 7,500+ papers with 2 or more authors
Methodology • Labeled LDA • Cosine Similarities • Look for Significant Patterns
Labeled LDA (Ramage et al.) • Variation of Latent Dirichlet Allocation (LDA) • Topics are constrained to be about specific tags associated with the documents • In this case, tags = authors • Result: a probabilistic term ‘signature’ for each author per year
Methodology • Labeled LDA • Cosine Similarities • Look for Significant Patterns
Cosine Similarity Author 1 Term-Signature Document Term-Vector = Similarity between author 1 and document = Similarity between author 2 and document Author 2 Term-Signature
Methodology • Labeled LDA • Cosine Similarities • Look for Significant Patterns
Sample Results • Average established author similarity score to papers • Break down by subfield • High similarity = more rigid, formal, requires training • Low similarity = more flexible, less defined, open to novelty High Similarity Scores Low Similarity Scores
Sample Results • Identification of ‘hedgehogs’ and ‘foxes’ • Hedgehogs specialize in a single area • Foxes dabble in several areas Top ‘Fox’ Authors Top ‘Hedgehog’ Authors
Conclusion • Suggested a system to determine author deviation from previous work on later papers • Tested the system on ACL collaborations • Presented preliminary results showing: • Hedgehog / fox style collaborators • Subfields that offer more flexibility for unestablished authors vs those that require more training • Stated a theory of collaboration styles and described how to use the system to identify these