130 likes | 251 Vues
Web Taxonomies Discovering the Structure of Information. Tim Weninger. Department of Computer Science University of Illinois Urbana-Champaign, Urbana, IL. Information wants to be free. World Wide Web is decentralized and messy. (but it wants to be structured)
E N D
Web TaxonomiesDiscovering the Structure of Information Tim Weninger Department of Computer Science University of Illinois Urbana-Champaign, Urbana, IL
Information wants to be free • World Wide Web is decentralized and messy. • (but it wants to be structured) • Taxonomies are used to describe hierarchical structure of data • Almost always hand crafted • Data is made (forced) to fit the taxonomy • Information wants to be free!
Information wants structure • Just like political science… in data science… • There is no such thing as digital anarchy • Government will always rise • Data democracy • Let the data decide its own form government
Web Graph Web Tree – is a really hard problem • How do we traverse the graph? • BFS • DFS • MST • With Replacement • Without Replacement • All links • Some links
Web Graph Web Tree • Lists of links • WWW2011 work • Link paths? • Most probable user navigation • PageRank We’re working on all of those – PageRank seems to work
Map taxonomies • Assumption • Two taxonomies from Web sites of similar organizational missions will be similar • Lets do integration
Brand new result --- Breakthrough this morning Cue scary graphs