CMSC5724 Data Mining and Knowledge Discovery

Fall 2024

Professor: Yufei Tao

TA: Nawapon Sangsiri (nawaponsangsiri@cuhk.edu.hk)
TA office hour: 3-4pm every Tue; office: RM 115, SHB.

Quick navigation links:
[Lecture Notes][Exercises and Quizzes][Project]

Brief Description

This course will cover the conceptual and algorithmic aspects of fundamental problems in data mining and knowledge discovery, including (subject to time permission) classification, clustering, association rule analysis, and so on. On completion, students are expected to have developed the ability to perform an array of mining tasks that are essential to numerous applications in practice.

Announcements

News 10 (10 Oct): The instructor has noted that 11 Oct (Fri) is a public holiday with no lectures. Therefore, the remaining quizzes will all be postponed by one week. The new test schedule is:

Quiz 2 (20 minutes): To be held in the lecture of 1 Nov (Fri, Week 9)
Quiz 3 (20 minutes): To be held in the lecture of 29 Nov (Fri, Week 13).

News 9 (Oct 5): Project 2 has been released. Please scroll to the bottom of the page to find the link.

News 8 (Oct 5): You can now view your Quiz 1 scores on Blackboard. The solutions and stats can be found in the "Exercises and Quizzes" section. You may collect your papers from the TA during his office hours. The information of his office hours can be found at the top of this page.

News 7 (Sep 23): All quizzes in this course are open book.

News 6 (Sep 21): Project 1 has been released. Please scroll to the bottom of the page to find the link.

News 5 (Sep 21): Quiz 1 will start at 9pm in the lecture of Sep 27. The scope covers everything from Lecture 1 to Lecture 3.

News 4 (6 Sep): Exercise List 1 has been released. Please scroll to the "Exercises and Quizzes" section at the bottom of the page. There will not be further announcements about exercise releases. Please check that section periodically.

News 3 (6 Sep): The video of the first lecture has been released. There will not be further announcements about video releases. Please check this page periodically. In general, video releases are not guaranteed, but the instructor plans to release them as long as they have been properly recorded.

News 2 (6 Sep): Please note the test schedule for this course:

Quiz 1 (20 minutes): To be held in the lecture of 27 Sep (Fri, Week 4)
Quiz 2 (20 minutes): To be held in the lecture of 25 Oct (Fri, Week 8) 1 Nov (Fri, Week 9)
Quiz 3 (20 minutes): To be held in the lecture of 22 Nov (Fri, Week 12) 29 Nov (Fri, Week 13).

News 1 (5 Sep): Hello all.

Time, Venues, and Zoom Link

Lecture: 6:30pm - 9:15pm Fri, Wu Ho Man Yuen Bldg 304
Zoom Link: https://cuhk.zoom.us/j/97424266516

Click here for a map of the campus.

Grading Scheme

Project: 30%
Short Tests or Assignments: 30%
Final: 40%

Textbook and Lecture Notes

No textbooks cover all the material of this course. Some reference books may be useful for extra reading:

[Book 1] Mohammed J. Zaki, and Wagner Meira Jr. Data Mining and Analysis: Fundamental Concepts and Algorithms.
[Book 2] Avrim Blum, John Hopcroft, and Ravindran Kannan. Foundations of Data Science.

Ownership of the above books is not mandatory. The instructor will make lecture notes available before each class. His notes cover all the content required in this course, some of which is outside the above books.

As usual, lecture attendance is vital for thorough understanding.

Lecture Notes Extra Reading
1
[Classification] Decision Trees and a Generalization Theorem
(video)

Chapter 19 of [Book 1]
Sec 5.5-5.6 of [Book 2]
2
[Classification] The Bayesian Method
(video)

Sections 18.1-18.2 of [Book 1]
3
[Classification] Perceptron
(video)

Section 5.8.3 of [Book 2]
4
[Classification] Generalization Theorems Using VC-Dims and Margins
(video 1)
(video 2)

--
5
[Classification] SVM and Margin Perceptron
(video)

Sections 21.1-21.2 of [Book 1]
6
[Classification] The Kernel Method
(video)

Sec 21.4 of [Book 1]
7
[Classification] Multiclass Perceptron
(video)

--
8
[Clustering] Centroid Methods

Section 13.1 of [Book 1]

Exercises and Quizzes

The instructor designed the following exercises to help you consolidate your understanding of the course material. Solutions are provided in full.

Exercise List 1 (Solutions)
Exercise List 2 (Solutions) Note: Problem 5 is outside the scope of quizzes and exams
Exercise List 3 (Solutions)
Exercise List 4 (Solutions)
Exercise List 5 (Solutions)
Exercise List 6 (Solutions)
Exercise List 7 (Solutions)

Quiz 1 solutions. Average = 84, Std. Dev. = 20.1

Project

The project page is here.

Deadlines:
  • For Project 1 11:59pm, 21 Oct, 2024
  • For Project 2 11:59pm, 4 Nov, 2024
To submit your project, pack all the deliverables (as detailed on the project page) into a Zip file and submit the file in the Blackboard system. Remember to list the ids and names of all the members in your team.