230 likes | 348 Vues
This project aims to develop a system that collates and analyzes social network profiles (Twitter, Facebook, Google+) of the top 70 Fortune 500 companies based on a baseline algorithm. By utilizing company names to search social media APIs, the system retrieves profiles and scores them using an innovative approach that considers edit distance, username-display name correlation, and relative popularity. The ultimate goal is to improve training sets by including more complex examples and integrating diverse profile data, thus refining the identity verification process across networks.
E N D
Objective System <Twitter Profile, Facebook Profile, G+ Profile, …> <Twitter Profile, Facebook Profile, G+ Profile, …> <Company Name>
Objective System Input Output <Twitter Profile, Facebook Profile, G+ Profile, …> Social Network Profiles Company Name
Record Linkage + Identity
Ground Truth • Two networks: Facebook and Twitter • Top seventy 2013 Fortune 500 companies
Baseline Algorithm • Take company name. • Search Facebook/Twitter API using it. • Return first result from each.
New Approach Score profiles based on • Edit Distance • Company Name – Username • Company Name – Display Name • Relative Popularity
Display Name Username
New Approach Score profiles based on • Edit Distance • Company Name – Username • Company Name – Display Name • Relative Popularity
Scoring Edit Distance Score: Popularity Score:
Next Steps • Improve training set: provide harder examples
Next Steps • Improve training set: provide harder examples • Incorporate more profile data
Next Steps • Improve training set: provide harder examples • Incorporate more profile data • Build system around classifiers