Link Distribution on Multilingual Wikipedia
130 likes | 329 Vues
KwangHee Park 02/17. Link Distribution on Multilingual Wikipedia. Introduction. Current Problem Analyze Link Distribution on Multilingual Wikipedia Goal Find Cultural Intention from Multilingual data for the Multilingual Synchronization. Example. Samsung. Methodology. Topic modeling
Link Distribution on Multilingual Wikipedia
E N D
Presentation Transcript
KwangHee Park 02/17 Link Distribution on Multilingual Wikipedia
Introduction • Current Problem • Analyze Link Distribution on Multilingual Wikipedia • Goal • Find Cultural Intention from Multilingual data for the Multilingual Synchronization
Example • Samsung
Methodology • Topic modeling • Target = 5 linked article • 34,577 number of article from each language • English, Espanol, French, Chinese, Korean • Linked term • Easy to handling in terms of Term boundary recognition problem
LDA approach Korean Wiki page Inter language link English Wiki page
Experiment • LingPipe API • Support LDA cluster • 20 number of topics • Linked term • English : random sample about #330,000 • Korean : about 220,000 • Document • English : 1000 number of article • Korean : 3185 number of article
Problem • Select total topic number • Topic number per document • Need to some threshold • Evaluation
Evaluation • Count Overlapping Terms in Topic and in Session • Limit 3 topics per document • Labeling to all topics and judge manually
Work plan • Experiment • Apply other language • French , Chinese, Espanol • Compare with old document • Analyze Latent changes