190 likes | 281 Vues
Gain knowledge and experience in Data Mining (DM) through a course conducted by Prof. Guozhu Dong. Understand the process of problem-solving, data handling, and knowledge transformation. Access course materials, info, and schedule on the course website. Engage in individual projects, exams, and homework to enhance learning. Share feedback and concerns for course improvement. Explore the interdisciplinary links of DM with Statistics, AI, Databases, and more. Discover the challenges and perspectives of DM, including applications in various fields.
E N D
CS499/699-10 Data Mining Fall 2003 Professor Guozhu Dong Computer Science & Engineering WSU Data Mining – Introduction G Dong (WSU)
Introduction • Introduction to this Course • Introduction to Data Mining Data Mining – Introduction Guozhu Dong
Introduction to the Course • First, about you - why take this course? • Your background and strength • AI, DBMS, Statistics, Biology, Business, … • Your interests and requests • What is this course about? • Problem solving • Handling data • transform data to workable data • Mining data • turn data to knowledge • validation and presentation of knowledge Data Mining – Introduction Guozhu Dong
This course • What can you expect from this course? • Knowledge and experience about DM • Problem solving skills • How is this course conducted? • Home works, projects, exams, classes • Course Format • Individual Projects: 30% • Exams and/or quizzes: 60% • Homeworks: 10% Data Mining – Introduction Guozhu Dong
Course Web Site • cs.wright.edu/~gdong/mining03/WSUCS499DataMining.htm • My office and office hours • RC 430 • 4:30-5:30, T Th • My email: gdong@cs.wright.edu • Slides and relevant information will be made available at the course web site Data Mining – Introduction Guozhu Dong
Any questions and suggestions? • Your feedback is most welcome! • I need it to adapt the course to your needs. • Please feel free to provide yours anytime. • Share your questions and concerns with the class – very likely others may have the same. • No pain no gain – no magic for data mining. • The more you put in, the more you get • Your grades are proportional to your efforts. Data Mining – Introduction Guozhu Dong
Introduction to Data Mining Definitions Motivations of DM Interdisciplinary Links of DM Data Mining – Introduction G Dong (WSU)
What is DM? • Or more precisely KDD (knowledge discovery from databases)? • Many definitions • An iterative process, not plug-and-play raw data transformed data preprocessed data data mining post-processing knowledge • One definition is • A non-trivial process of identifying valid, novel, useful and ultimately understandable patterns in data Data Mining – Introduction Guozhu Dong
Need for Data Mining • Data accumulate and double every 9 months • There is a big gap from stored data to knowledge; and the transition won’t occur automatically. • Manual data analysis is not new but a bottleneck • Fast developing Computer Science and Engineering generates new demands • Seeking knowledge from massive data • Any personal experience? Data Mining – Introduction Guozhu Dong
When is DM useful • Data rich world • Large data (dimensionality and size) • Image data (size) • Gene chip data (dimensionality) • Little knowledge about data (exploratory data analysis) • What if we have some knowledge? Data Mining – Introduction Guozhu Dong
DM perspectives • KDD “goals”: Prediction, description, explanation, optimization, and exploration • Knowledge forms: patterns vs. models • Understandability and representation of knowledge • Some applications • Business intelligence (CRM) • Security (Info, Comp Systems, Networks, Data, Privacy) • Scientific discovery (bioinformatics, medicine) Data Mining – Introduction Guozhu Dong
Challenges • Increasing data dimensionality and data size • Various data forms • New data types • Streaming data, multimedia data • Efficient search and access to data/knowledge • Intelligent update and integration Data Mining – Introduction Guozhu Dong
Interdisciplinary Links of DM • Statistics • Databases • AI • Machine Learning • Visualization • High Performance Computing • supercomputers, distributed/parallel/cluster computing Data Mining – Introduction Guozhu Dong
Statistics • Discovery of structures or patterns in data sets • hypothesis testing, parameter estimation • Optimal strategies for collecting data • efficient search of large databases • Static data • constantly evolving data • Models play a central role • algorithms are of a major concern • patterns are sought Data Mining – Introduction Guozhu Dong
Relational Databases • A relational database can contain several tables • Tables and schemas • The goal in data organization is to maintain data and quickly locate the requested data • Queries and index structures • Query execution and optimization • Query optimization is to find the “best” possible evaluation method for a given query • Providing fast, reliable access to data for data mining Data Mining – Introduction Guozhu Dong
AI • Intelligent agents • Perception-Action-Goal-Environment • Search • Uniform cost and informed search algorithms • Knowledge representation • FOL, production rules, frames with semantic networks • Knowledge acquisition • Knowledge maintenance and application Data Mining – Introduction Guozhu Dong
Machine Learning • Focusing on complex representations, data-intensive problems, and search-based methods • Flexibility with prior knowledge and collected data • Generalization from data and empirical validation • statistical soundness and computational efficiency • constrained by finite computing & data resources • Challenges from KDD • scaling up, cost info, auto data preprocessing, more knowledge types Data Mining – Introduction Guozhu Dong
Visualization • Producing a visual display with insights into the structure of the data with interactive means • zoom in/out, rotating, displaying detailed info • Various types of visualization methods • show summary properties and explore relationships between variables • investigate large DBs and convey lots of information • analyze data with geographic/spatial location • A pre- and post-processing tool for KDD Data Mining – Introduction Guozhu Dong
Bibliography • J. Han and M. Kamber. Data Mining – Concepts and Techniques. 2001. Morgan Kaufmann. • D. Hand, H. Mannila, P. Smyth. Principals of Data Mining. 2001. MIT. • W. Klosgen & J.M. Zytkow, edited, 2001, Handbook of Data Mining and Knowledge Discovery. Data Mining – Introduction Guozhu Dong