1 / 9

Harnessing Mathematics as a New Domain for Data Mining Opportunities

This presentation explores the vast potential of mathematics as a data mining domain, likening its significance to that of biology. Various databases of mathematical information are examined, including Mathworld and MathSciNet, highlighting their wealth of resources such as entries, articles, and functions. Key hurdles for effective data mining are discussed, alongside suggested methods to overcome them. By imposing homogeneity and leveraging cross-domain sharing, this talk emphasizes how simple techniques can unearth significant mathematical conjectures, leading to substantial academic and practical rewards.

xena-henry
Télécharger la présentation

Harnessing Mathematics as a New Domain for Data Mining Opportunities

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mathematics – A new Domain for Datamining? Simon Colton simonco@cs.york.ac.uk http://www.dai.ed.ac.uk/~simonco Universities of Edinburgh & York United Kingdom

  2. Mathematics is the new Biology • Many databases of math information • Massive potential for datamining • This talk • Overview of mathematics databases • Hurdles to overcome for datamining • Suggested Methods • Potential Rewards

  3. Mathematical Databases • Mathworld encyclopedia • 8974 entries, 153958 cross-references, 1400 pages • MathSciNet citation service • 10843 reviews, 151350 articles, 358104 authors • Mizar library of formalised maths • 666 articles, 2000 concept definitions • Mathematica CAS functions • Tens of thousands of computer algebra functions

  4. Mathematical Databases • Encyclopedia of Integer Sequences • 60,000 sequences with terms, definitions, etc. • Inverse Symbolic Calculator • 50 million constants, 400 tables • Gap library (CAS) • 6 million groups • Ad hoc databases everywhere • Geometry junkyard, My favourite constants

  5. Problems with the Data • Highly heterogeneous • No agreed upon format for concepts, conjectures • Distributed • Hundreds of websites • Dynamic • Eg. 50 new integer sequences daily • Really need to impose homogenuity

  6. Suggestions for Datamining • Conjectures: simple relationships between concepts • Equivalence, implication, nonexistence, moonshine • Need to worry about interestingness • Plausibility, complexity, surprisingness • Concept formation to get correct statements • Composition, tweaking, monster-barring

  7. Potential Rewards - Example • NumbersWithNames program • http://machine-creativity.com/programs/nwn • Datamining the Encyclopedia of Integer Sequences • Perfect numbers are pernicious • Perfect: sum of divisors is twice the number • Pernicious: prime number of 1s in binary • 6, 28, 496, …. • Found by looking for subsequences • Lots more of similar examples

  8. Potential Rewards: Money & Fame • Money • EPSRC funded big project: e-science • E-maths initiative being discussed • Fame • Monstrous Moonshine Conjectures • Found by accident (numbers 196833 & 196884) • Led to Fields Medal (see paper)

  9. Conclusions and Future Work • Consider mathematics as a datamining domain • Much data available, but there are problems • Techniques required are simple • Possible to make important conjectures • Cross domain/database sharing of data • Projects like NumbersWithNames • http://machine-creativity.com/programs/nwn

More Related