Data Mining
​
Instructor information
-
Name: Roozbeh Razavi-Far
-
Office: CEI 2134
-
Office Hours: Fridays from 14:30 until 16:00
Class and lab information
-
Location: University of Windsor, Chrysler Hall South 53 [online class]
-
Time: Mondays from 16:00 to 18:50
​
Course Description:
With fast advances in information technology, there has been explosive growth in our capabilities to generate and collect data in the last decade. How to analyze a large amount of data in an understandable and efficient way remains a challenging problem. Data mining addresses this problem by providing methodologies to automate the analysis and exploration of large complex data sets.
This course will cover the basic topics of data analysis and data mining to extract patterns and underlying knowledge from data and transform it into an understandable structure for further use, for instance, in machine learning, predictive analytics, process control, fault diagnosis, monitoring and decision making.
In this class, various computational data mining techniques at the intersection of artificial intelligence, machine learning, and statistical learning will be introduced and considerable efforts will also be given on their implementation, strengths, and weaknesses for different applications.
​
Required Resources:
Primary Text:
[1] Lecture slides provided by the instructor
The following books are strongly recommended:
[2] Data Mining: Concepts and Techniques, by Jiawei Han, Micheline Kamber, and Jian Pei
Additional resources:
[3] Data Mining: Practical Machine Learning Tools and Techniques, by Ian H. Witten, Eibe Frank, and Mark Hall
[4] The Elements of Statistical Learning, by Trevor Hastie, Robert Tibshirani, and Jerome Friedman
[5] Pattern Recognition and Machine Learning, by C. M. Bishop
​
Course Schedule
The following course schedule is approximate.
​
-
Week 01:
-
Teaching subjects: course introduction, an introduction to data mining (pdf)
-
Textbook Chapter or Readings: Lectures 1 and 2
-
-
Week 02:
-
Teaching subjects: data exploration, and data processing.
-
Textbook Chapter or Readings: Lectures 3 and 4
-
-
Week 03:
-
Teaching subjects: data preparation, missing data, data integration, and data transformation (discretization and encoding)
-
Textbook Chapter or Readings: Lecture 4
-
-
Week 04:
-
Teaching subjects: expectation-maximization, and data reduction.
-
Textbook Chapter or Readings: Lectures 4* and 5
-
-
Week 05:
-
Teaching subjects: nonlinear iterative partial least squares, and pattern discovery.
-
Textbook Chapter or Readings: Lectures 5 and 6
-
-
Week 06:
-
Teaching subjects: advanced topics on pattern discovery, introduction to learning and regression analysis.
-
Textbook Chapter or Readings: Lectures 6 and 7
-
-
Week 07: Study week
-
Week 08:
-
Teaching subjects: cluster analysis
-
Textbook Chapter or Readings: Lectures 8 and 9
-
-
Week 09:
-
Teaching subjects: classification, and nearest neighbors
-
Textbook Chapter or Readings: Lectures 10
-
-
Week 10:
-
Teaching subjects: classification, decision tree, and naïve Bayes
-
Textbook Chapter or Readings: Lectures 10, 11 and 12
-
-
Week 11:
-
Teaching subjects: classification techniques, performance evaluation, cross-validation, and ROC
-
Textbook Chapter or Readings: Lectures 12 and 13
-
-
Week 12:
-
Teaching subjects: confidence estimation, over-fitting, and ensemble models
-
Textbook Chapter or Readings: Lectures 13 and 14
-
-
Week 13:
-
Teaching subjects: data stream mining, analysis of large graphs, recommender system
-
Textbook Chapter or Readings: Lecture 15
-
​
There will be course projects that involve developing and programming machine learning algorithms using MATLAB /or/ Python /or/ R /or/ FORTRAN /or/ C /or/ C++. The projects will have to be demonstrated during the semester (Primary and Final Demos are mandatory).
Evaluation Methods
The course grade will be evaluated as follows:
-
Participation: 5%
-
Assignments: 10%
-
Exam (closed-book): 40%
-
Final project (group): 45%
​
Teaching Assistants:
-
Daoming Wan
-
Hossein Hassani
​