290 likes | 420 Vues
This presentation by Nick Mattei explores the identification of close-knit "tribes" within employment patterns, as researched by Lisa Friedland and David Jensen. Using advanced graph mining techniques, the study uncovers the connections among individuals in various organizations, revealing underlying group structures and anomalies in job transitions. The analysis covers over 4.8 million records and discusses factors such as multiemployment, time series analysis, and collusion detection. Findings lead to insights on employee networking and organizational dynamics, emphasizing the necessity for ongoing research in this area.
E N D
Finding Tribes: Identifying Close-Knit Individuals fromEmployment Patterns Lisa Friedland and David Jensen Presented by Nick Mattei
Introduction • Tribes – groups with similar traits in a large graph • Distinguish those that work together and move together intentionally
Relationship Knowledge Discovery • Exploit connections among individuals to identify patterns and make predictions • Discover underlying dependencies • Links must be inferred
Graph Mining • Discover Hidden Group Structures • Animal Herds, Webpages, Employees • Time Series Analysis • Co-integration (Economics) • Security and Intrusion Detection • Dynamic Networks
Motivation • National Association of Securities Dealers • Fraud • Collusion • 4.8 Million Records • 2.5 Million Reps at 560,000 Firms • 100 Years of Data
Complications • Jobs not necessarily in order (or singletons) • 20% of employees hold more than one job at a time • 10% begin multiple jobs (up to 16) on one day • Leave gaps between employment • Mergers and acquisitions
Finding Anomalously Related Entities • Input: • Bipartite Graph: G = (R A, E) • Entities: R = {r1, r2, …, rn} (People) • Attributes: A = {a1, a2, …, am} (Orgs.) • Entities should connect several attributes • Model co-occurrence rates of pairs of attributes
Simple Model Measures • JOBS = (Number of shared Jobs in the sequence) • YEARS = (Number of Years of overlap)
Probabilistic Model • X = P(BrA -> BrB -> BrC -> BrD) • = pa * tAB * tBC * tCD • Estimate: • P(start branch i) • =(#reps ever at i) / (#reps in database) • Tij = P(reps from i to j | #ever at i) • =(#reps leave i to go to j) / (ever at i)
Probabilistic Model • Null Hypothesis of Independent Movement • Movement Not Random • Split and Merge • Markov Chains
Probabilistic Model (Different Paths) • Tij becomes Vij • Vij = P(move to branch j at any point after branch I | currently at i) • = (# reps who go to branch j at any point after working at i) / (# reps ever at i) • Now each vij >= tij and probabilities no longer sum to 1.
Probabilistic Model (Different Paths) • Vij becomes Wij • Wij = P (move to branch j at any point simultaneous to or after branch i | currently at i) • = (# reps who start at j at any point simultaneous or after starting at i) / (# of reps ever at i) • Now less precise in respect to direct transitions but more general
PROB - TIMEBINS • Bins of 1 year or more • 10 people worked at each branch in a bin period • PiX = # reps ever at i during time X / # reps in DB • yiXjY = # reps ever at I during time X and at j during time Y, where Y >= X / # reps ever at i during time X
PROB-NOTIME • Ignores order of job moves • Use original pi • Zij = raw number of reps who are at both branches I and j during career • Transition Pr from i to j: • = (zij / # reps ever at i) • != (zij / # reps ever at j) • =transition Pr from j to i
Discussion • JOBS, PROB, PROB-TIME, PROB-NOTIME create tribes with higher than average disclosure scores • PROB creates more cross zip code results • PROB-TIME has higher phi-squared than all others • PROB favors large firms
Discussion • JOBS and YEARS compute larger connected components • JOBS and PROB find same number of tribes but pick different groups as tribes
Conclusions • With no explicit knowledge we can discover: • Job transitions • Geography • Career track
Conclusions • Needed: • Ongoing process • Multiple affiliations • Arbitrary times • Time is a paradox in domain
Thanks! • Time for: • Questions • Comments • Smart Remarks