Exploring Data and Text Mining: Discovering Unknown Information
560 likes | 696 Vues
In the 2007 work by Geoffrey Bilder, "Data and Text Mining: The Search for Unknown Knowns," the complexities of mining valuable insights from unstructured data are addressed. It discusses the distinctions between data mining, information retrieval, extraction, and analysis, emphasizing their roles in uncovering previously unknown information. The text metaphorically compares data mining to gold and diamond mining, highlighting the importance of exploring new publishing methods. This exploration is crucial for publishers and underscores the ongoing evolution of the Semantic Web and its implications for information discovery.
Exploring Data and Text Mining: Discovering Unknown Information
E N D
Presentation Transcript
Data and text mining: the search for unknown knowns • Geoffrey Bilder • UKSG, 2007 • gbilder@crossref.org
"Reports that say that something hasn't happened are always interesting to me, because as we know, there are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns -- the ones we don't know we don't know."
Information Retrieval Information Extraction Information Analysis + +
new, previously unknown information Data Mining
Information Retrieval Information Extraction Information Analysis + +
Crucial question for publishers is: “If ‘hiding’ information in unstructured text is a problem- then shouldn’t we be exploring new ways to “publish”?
The word tobacco originates from the Taino indians. • There is no I in the word Team. • The book captured the zeitgeist of the time. • I am sure that I turned the gas off.
The book captured the <foreign_phrase lang="DE">zeitgeist</foreign_phrase> of the time. I am <emphasis>sure</emphasis> that I turned the gas off.
The thing’s property The book has an author “Jorge Luis Borges” Subject Predicate Object
URI URI The book has an author “Jorge Luis Borges” Subject Predicate Object
RDF: Resource Description Framework http://www.amazon.com/isbn/978-0140286809 has an author http://www.wikipedia.com/borges
Blog Journal A Journal B Wiki Personal Website OPAC
Blog Journal A Journal B Wiki Personal Website OPAC
SPARQL http://api.ingentaconnect.com/content/cabi/nrr/latest?format=rss PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT DISTINCT ?name WHERE { ?x rdf:type foaf:Person . ?x foaf:name ?name } ORDER BY ?name
Creative Commons FOAF Geo RSS 1.0 FRBR SKOS
Data Mining = Information retrieval + Information extraction + Information analysis... With the goal of discovering new, previously unknown information
Data Mining = Information retrieval + Information extraction + Information analysis... With the goal of discovering new, previously unknown information Text Data Mining = Complex data extraction layer + data mining