CMSC 5724 Project Page
Team Coding
Each team can contain up to 5 members and will implement a designated algorithm either in C++, Java, or Python. The implementation must be from scratch, i.e., it can use only functions from a standard library, for example:
Use of any function outside the above libraries is not permitted unless prior approval has been obtained from the instructor. All source code is subject to plagiarism scrutiny. All kinds of dishonesty will be reported to the university for disciplinary actions.
Deploying a programming language other than the above requires an approval from the instructor.
Project List
Each team can choose to work on any of the following projects.
Additional projects will be released after their topics have been covered in the lectures.
Project #1: Decision Tree
Goal
Implement Hung's algorithm for decision tree classification
Dataset
We will use the Adult dataset whose description is available here. The training set (adult.data) and evaluation set (adult.test) can be downloaded here.
Preprocessing
Remove all the records containing '?' (i.e., missing values). Also, remove the attribute "native-country".
Deliverables
- An executable program, which should output a decision tree to the disk when given an input training set.
- A readme file detailing how to use the program.
- Source code.
- A document describing (i) the decision tree built from the Adult training set, and (ii) a report on using the tree to classify the records of the evaluation set. The report should indicate, for each record in the evaluation set, its attributes and whether it has been classified successfully.
Project #2: Margin Perceptron
Goal
Implement the margin perceptron algorithm.
Dataset
Your implementation should work on any dataset in the following format:
- The first line contains three numbers n, d, and r, where n is the number of points, d is the dimensionality of the instance space, and r is the radius.
- The i-th line (where i goes from 2 to n + 1) gives the (i - 1)-th point in the dataset as:
x1,x2,...,xd,label
where the first d values are the coordinates of the point, and label = 1 or -1.
We have prepared three datasets for you:
2d-r16-n10000
4d-r24-n10000
8d-r12-n10000
Deliverables
- An executable program.
- A readme file detailing how to use the program.
- Source code.
- A report explaining the margin of your classifier for each of the three datasets.
Project #3: Bayes Classifier, K-Center, K-Means
This project has two parts.
=============
=== PART I ===
=============
Goal
Implement the Bayes Classifier.
Dataset, Preprocessing
Same as Project #1.
Deliverables
- An executable program.
- A readme file detailing how to use the program.
- A report on using the program to classify the records of the evaluation set. The report should indicate, for each record in the evaluation set, its attributes and whether it has been classified successfully.
=============
=== PART II ===
=============
Goal
Implement the k-means algorithm using the k-center algorithm for center initialization.
Dataset
Download here (obtained from the data collection here). Each line has the following format:
x y
which represent the x- and y-coordinates of a point.
Task
Partition the dataset into 8 clusters.
Deliverables
- An executable program.
- A readme file detailing how to use the program.
- A report explaining the clusters found (e.g., giving a visualization of each cluster).
- Source code.
Project #4: DBSCAN
Goal
Implement the DBSCAN algorithm.
Dataset
Download here (obtained from the data collection here). Each line has the following format:
x y
which represent the x- and y-coordinates of a point.
Task
Partition the dataset into 3 clusters.
Deliverables
- An executable program.
- A readme file detailing how to use the program.
- A report explaining the clusters found (e.g., giving a visualization of each cluster).
- Source code.
Project #5: PCA for Image Compression
Goal
Reproduce the experimental results on this page.
Dataset
The original image can be downloaded here here.
Task
Implement the PCA-based compression method discussed in the lecture.
Deliverables
- An executable program that, given the original image and a parameter k, outputs an image produced using the k eigenvectors returned by PCA.
- A readme file detailing how to use the program.
- A report showing the images produced.
- Source code.