CSCI3320

CSCI3320: Fundamentals of Machine Learning

General Expectations: Student/Faculty's Expectations on Teaching and Learning

Message: 2020-2021 was my last year of teaching CSCI3320, Fundamentals of Machine Learning, a course which I have taught for 7 or 8 years. It has been fun and as someone said, usually it is the teacher who learns the most. In any case, for students who want to take CSCI3320, please refer to the syllabus by the current course instructor.

Instructor: Prof. John C.S. Lui , office hours: Thursday, 8:30-10:30am.

Machine learning (ML) is a method of data analysis that automates analytical model building. Some people say that ML is a branch of artificial intelligence. Personally, I think that ML is really a branch of statistics. In any case, this course provides an introduction to machine learning. It is designed to give undergraduate students a taste of various machine learning techniques. Students need to have a good background in probability, statistics, a bit of optimizaton as well as programming (e.g., Python) to appreciate various methods.

Furthermore, students need to spend time to read the textbook, as well as to put in the effort to read various resources on the Internet, do the homework, attend the lectures and tutorials to understand and keep pace with this course. If you skip some classes, please remember that you are solely responsibile for you own actions on any missed lectures or announcemnets.

Although skipping classes is now a norm in CUHK, but I like to emphasize that if you skip lectures/tutorial in this course, you will easily get lost and will not be able to keep pace with the lectures. So, words of advice, do not skip any classes or tutorials.

Machine learning is an essential knowledge in computer science/engineering, and a highly sought after skill in the industry. If you are well-trained in this subject, surely you can find a good job. Nevertheless, the subject is not for the faint-hearted students. I will discuss the mathematics, theories, algorithms and programming techniques behind different machine learning methods, and students need to do various homework and exercises to understand the subject.

References:

Course Grades:

Written homework (will still be given out): 0%;
Python/Scikit-learn Programming : 40%;
Final Examination: 60%
Policy on letter grades !!!!!

Policies:

No late homework or programming project will be accepted. NO EXCEPTION !!!
For those students who don't study hard, well, please watch this video: "Wise saying from Gandalf, wizard from the Lord of the Rings"

Announcemnet:

TBA.

Final Examination : Topics to be covered in the final exam are in general the materials we went through in the lectures and tutorials, these include:

Statistics, sampling, curve fitting, correlation theory
Basic concepts in matrix calculus, linear algebra, Lagrangian Optimization
Supervised and un-supervised learning
VC dimension
Bayesian Decision Theory
Parametric Methods: Univariate and Multivariate methods
Dimensionality Reduction via PCA, Feature Embedding, LDA.
Clustering via K-Mean Algorithm
EM Algorithm
Matrix Factorization
Linear Discriminant: logistic classification and regression
Decision trees
Random forests
Support vector machines
Neural networks
...etc

Lecture Notes: Lecture and tutorial notes and videos can be downloaded from the Blackboard at CUHK.

Introduction on Machine Learning (online lecture)
Review on Statistics (pre-recorded lecture)

Statistical Sampling
Estimation Theory
Hypothesis Testing
Curve Fitting
Least Sqaured Regression
Regression
Corrrelation Theory
Q-Q Plot
Derivation of Least Squares
Statistical Sampling
Estimation Theory
Hypothesis Testing
Curve Fitting
Least Sqaured Regression
Regression
Corrrelation Theory
Q-Q Plot
Derivation of Least Squares
Some exercises on "Review of Statistics" (online lecture)
Overview of Supervised Learning (pre-recorded lecture)

What is supervised learning in classification ?
Probably approximately correct (PAC)
Vapnik-Chervonenkis (VC) Dimension
What is supervised learning in regression ?
What is supervised learning in classification ?
Probably approximately correct (PAC)
Vapnik-Chervonenkis (VC) Dimension
What is supervised learning in regression ?
Examining Your Data or Cleaning your Data: PANDAS Tutorial (online lecture with Python code in Jupyter notebook)

loading a CSV file
find out various display options
examine the data and schema of the data file
relationship with Python's dictionary
select some features to display
setup a filter and select some data which qualified for the filter
incorporiate the Python's string library to set up a filter
modify feature names as well as data in the dataframe
add/remove data into dataframe
sort data in the dataframe
grouping the data
aggregating the data
exploring the data
casting datatypes and handling missing values
working with dates
working with time series data
reading/writing data to different sources: Excel, JSON, ...etc.
loading a CSV file
find out various display options
examine the data and schema of the data file
relationship with Python's dictionary
select some features to display
setup a filter and select some data which qualified for the filter
incorporiate the Python's string library to set up a filter
modify feature names as well as data in the dataframe
add/remove data into dataframe
sort data in the dataframe
grouping the data
aggregating the data
exploring the data
casting datatypes and handling missing values
working with dates
working with time series data
reading/writing data to different sources: Excel, JSON, ...etc.
Overview of Bayesian Decision Theory (pre-recorded lecture)

Bayes' Rule: Machine Learning perspective
Loss/Risk Functions, discriminant functions
Introduction to correlation and causality
Introduction to causal and diagnostic inference
Simple Bayesian Networks and Simple Bayes' Classifiers
Association Rules
Bayes' Rule: Machine Learning perspective
Loss/Risk Functions, discriminant functions
Introduction to correlation and causality
Introduction to causal and diagnostic inference
Simple Bayesian Networks and Simple Bayes' Classifiers
Association Rules
Regression, Overfitting, Underfitting and Prediction in Python (online lecture with Python code in Jupyter notebook)

Numpy array
Shape and reshape of Numpy array
Use Numpy array as index on another Numpy array
Elementwise logical comparison in Numpy array
Setting minimum and maximum in all elements in an numpy array via clip()
Cleaning numpy array by filtering NaN entries
Brief introduction to Scipy
Loading datafile via Scipy
Checking NaN and filtering them out via Scipy from the array
Performing a scatting plot in matplotlib
Performing a polynomial best fit on the data
Piece-wise polynomial fit via "one" (or multiple) change point
Fit model after the change point, and use models for "future" prediction
Split the training and testing, and do prediction
Numpy array
Shape and reshape of Numpy array
Use Numpy array as index on another Numpy array
Elementwise logical comparison in Numpy array
Setting minimum and maximum in all elements in an numpy array via clip()
Cleaning numpy array by filtering NaN entries
Brief introduction to Scipy
Loading datafile via Scipy
Checking NaN and filtering them out via Scipy from the array
Performing a scatting plot in matplotlib
Performing a polynomial best fit on the data
Piece-wise polynomial fit via "one" (or multiple) change point
Fit model after the change point, and use models for "future" prediction
Split the training and testing, and do prediction
Evaluation metrics (pre-recorded lecture)

Confusion matrix, accuracy, precision and recall
Example code on ML background, confusion matrix, accuracy and recall (with scikit-learn code in Jupyter notebook)
Confusion matrix, accuracy, precision and recall
Example code on ML background, confusion matrix, accuracy and recall (with scikit-learn code in Jupyter notebook)
Data cleansing and data processing in scikit-learn (pre-recorded lecture) (with scikit-learn code in Jupyter notebook)

CSV file as input
Data cleansing, re-labelling, one-hot encoding
Split and test
Decision tree and random forest
CSV file as input
Data cleansing, re-labelling, one-hot encoding
Split and test
Decision tree and random forest
Classification in scikit-learn (pre-recorded lecture) (with scikit-learn code on Jupyter notebook)

Decision tree and how it outputs feature importances
Display of result using decision tree
Use of DummyClassifier and how we loop through different classifier strategy
A glimpse of other classifers like: neural network, KNN, SVC, SVM, Linear SVC, Adaboost..etc
Concept of training time and score of each classifier
Feature importance of Adaboost
Multiclass classification
Example of digit recognition
Confusion matrix and the use of mglearn to display confusion matrix
Use of classification_report to display precision, recall, f1-score and suppot for all classes
Prediction probabilities for each testing input
Decision tree and how it outputs feature importances
Display of result using decision tree
Use of DummyClassifier and how we loop through different classifier strategy
A glimpse of other classifers like: neural network, KNN, SVC, SVM, Linear SVC, Adaboost..etc
Concept of training time and score of each classifier
Feature importance of Adaboost
Multiclass classification
Example of digit recognition
Confusion matrix and the use of mglearn to display confusion matrix
Use of classification_report to display precision, recall, f1-score and suppot for all classes
Prediction probabilities for each testing input
Parametric Methods (pre-recorded lecture)

Maximum likelihood estimator
Estimator: bias vs. variance
Unbiased estimator, consistent estimator, asymptotically unbiased estimator
Bayes' estimator
Parametric Classification
Parametric Regression
Bias/Variance Dilemma
Illustration of Model Selection
Maximum likelihood estimator
Estimator: bias vs. variance
Unbiased estimator, consistent estimator, asymptotically unbiased estimator
Bayes' estimator
Parametric Classification
Parametric Regression
Bias/Variance Dilemma
Illustration of Model Selection
Introduction to Classification in Python and Scikit-learn (online lecture with Python and scikit-learn codee in Jupyter notebook)

Visualization of subset of features in our dataset
From visualization, discover classification rules
Use of simple threshold technique to do classification
The need to split up the data into training and validation
From leave-one cross-validation to k-fold cross validation
Using 1NN and KNN as classifiers
The need to normalize all features
Color scatter plot of results in KNN (with different values of k
Classification via random forest
Visualization of subset of features in our dataset
From visualization, discover classification rules
Use of simple threshold technique to do classification
The need to split up the data into training and validation
From leave-one cross-validation to k-fold cross validation
Using 1NN and KNN as classifiers
The need to normalize all features
Color scatter plot of results in KNN (with different values of k
Classification via random forest
How to do regression in Python and Scikit-learn (online lecture with Python and scikit-learn code in Jupyter notebook)

Single feature linear regression (or least square fit)
Multi-dimensional linear regeression
Regression using Ridge, Lasso and ElasticNet Model
Tunning hyperparameter within a learner
Illustrate the problem of not using cross-validation (or use ALL DATA for training).
Illustrate how to use ElasticNet for regresion and how to use the L₁ ratio to tune λ₁ and λ₂.
Single feature linear regression (or least square fit)
Multi-dimensional linear regeression
Regression using Ridge, Lasso and ElasticNet Model
Tunning hyperparameter within a learner
Illustrate the problem of not using cross-validation (or use ALL DATA for training).
Illustrate how to use ElasticNet for regresion and how to use the L₁ ratio to tune λ₁ and λ₂.
Regression in scikit-learn (pre-recorded lecture with scikit-learn code in Jupyter notebook)

load/fetch/make_ datasets in scikit-learn
understanding the meta data from a pickel compressed file (PKZ)
Regression metrics: explained variance score, mean absolute error, r2 score
Doing regression with multiple linear learners
Understanding various regularization methods
Doing regression with multiple non-linear learners
load/fetch/make_ datasets in scikit-learn
understanding the meta data from a pickel compressed file (PKZ)
Regression metrics: explained variance score, mean absolute error, r2 score
Doing regression with multiple linear learners
Understanding various regularization methods
Doing regression with multiple non-linear learners
Real Life Classification: rating answers in Stackoverflow (online lecture with Python and scikit-learn code in Jupyter notebook)

Fetch and preprocess a 90GB raw XML data (yes, it is painful)
Creating a first nearest-neighbor classifier
Looking into how to improve the classifier's performance
Change from nearest-neighbor to logistic regression
Use precision, recall and AUC to better understand the classifier's performance
Prepare the final version
Fetch and preprocess a 90GB raw XML data (yes, it is painful)
Creating a first nearest-neighbor classifier
Looking into how to improve the classifier's performance
Change from nearest-neighbor to logistic regression
Use precision, recall and AUC to better understand the classifier's performance
Prepare the final version
Dimensionality Reduction (pre-recorded lecture)

Feature Selection vs. Feature Extraction
Principle Components Analysis (PCA)
PCA via Spectral Decomposition
Illustration of PCA via scikit-learn
Feature Embedding
Singular Value Decomposition and Matrix Factorization (Sample Python code for matrix factorization)
Multidimensional Scaling (MDS)
Linear Discriminant Analysis (LDA)
Canonical Correlation Analysis
Locally Linear Embedding and Laplacian Eignemaps
Feature Selection vs. Feature Extraction
Principle Components Analysis (PCA)
PCA via Spectral Decomposition
Illustration of PCA via scikit-learn
Feature Embedding
Singular Value Decomposition and Matrix Factorization (Sample Python code for matrix factorization)
Multidimensional Scaling (MDS)
Linear Discriminant Analysis (LDA)
Canonical Correlation Analysis
Locally Linear Embedding and Laplacian Eignemaps
Dimensionality Reduction in action (online lecture with Python and scikit-learn code in Jupyter notebook)

Feature selection vs. feature projection methods
How to use correlation, in particular, Pearson Coefficient, to find out linear relationship among two features
Discuss how to use mutual information to discover linear and non-linear relations between two features.
Discuss how to use recursive wrapper as recursive feature elimination to select features.
Discuss PCA, LDA and Multidimensioal Scaling (MDS).
Feature selection vs. feature projection methods
How to use correlation, in particular, Pearson Coefficient, to find out linear relationship among two features
Discuss how to use mutual information to discover linear and non-linear relations between two features.
Discuss how to use recursive wrapper as recursive feature elimination to select features.
Discuss PCA, LDA and Multidimensioal Scaling (MDS).
Clustering (pre-recorded lecture)

K-mean Algorithm
Expectation Maximization (EM) Algorithm
Application of EM in Learning Gaussian Mixture
Mixtures of Latent Variable Models
Spectral Clustering
Hierarchical Clustering: Agglomerative and Divisive Clustering
Optional Reading 1: EM Demystified - An Expectation-Maximization Tutorial
Optional Reading 2: Expectation Maximization - A Gentle Introduction
Optional Reading 3: The EM Algorithm
Optional Reading 4: The Expectation Maximization Algorithm.pdf
K-mean Algorithm
Expectation Maximization (EM) Algorithm
Application of EM in Learning Gaussian Mixture
Mixtures of Latent Variable Models
Spectral Clustering
Hierarchical Clustering: Agglomerative and Divisive Clustering
Optional Reading 1: EM Demystified - An Expectation-Maximization Tutorial
Optional Reading 2: Expectation Maximization - A Gentle Introduction
Optional Reading 3: The EM Algorithm
Optional Reading 4: The Expectation Maximization Algorithm.pdf
Text Pre-processing, NLTK and Finding top k documents via Clustering Technique (online lecture with Python and scikit-learn code in Jupyter notebook)

Pre-processing documents or text via NLTK, e.g., bag-of-words technique
Compare similarity of a document with a set of documents using raw vectors
Compare similarity of a document with a set of documents using normalized vectors
Applying "stop words" into the vectorizer
Applying "stemming" into the vectorizer
Applying Term Frequency (TF) and Inverse Document Frequency (IDF) into the vectorizer
Applying K-mean algorithm and plotting decision space
Clustering on a realistic dataset
Given a new post, find "similar" posts in a corpus
Pre-processing documents or text via NLTK, e.g., bag-of-words technique
Compare similarity of a document with a set of documents using raw vectors
Compare similarity of a document with a set of documents using normalized vectors
Applying "stop words" into the vectorizer
Applying "stemming" into the vectorizer
Applying Term Frequency (TF) and Inverse Document Frequency (IDF) into the vectorizer
Applying K-mean algorithm and plotting decision space
Clustering on a realistic dataset
Given a new post, find "similar" posts in a corpus
Multivariate Parametric Methods (pre-recorded lecture)

Multivariate Parameters and Estimation
Multivariate Normal Distributions
Multivariate Parametric Classification in Multivariate Normal Distributions
Multivariate Parametric Classification in Multivariate Bernoulli/Multinomial Distributions
Multivariate Regression
Multivariate Parameters and Estimation
Multivariate Normal Distributions
Multivariate Parametric Classification in Multivariate Normal Distributions
Multivariate Parametric Classification in Multivariate Bernoulli/Multinomial Distributions
Multivariate Regression
Linear Discrimination (pre-recorded video)

Generalizing the Linear Model
Geometry of the Linear Discriminant
Linear Discriminant via Pairwise Separation
Logistic Discriminant: Two and Multple Classes
Discriminant by Regression
Discriminant via Ranking
Generalizing the Linear Model
Geometry of the Linear Discriminant
Linear Discriminant via Pairwise Separation
Logistic Discriminant: Two and Multple Classes
Discriminant by Regression
Discriminant via Ranking
Recommender Systems (online lecture with Python and scikit-learn code in Jupyter notebook)

Making recomendation in machine learning based on previous user-product ratings (Netflix-like recoomendation)
Visualization of matrix sparsity
Finding similar users or similar products to make recommendation
Using regression technique to make recommendation
Using ensemble learning to make recommendation
Basket analysis for non-numeric data
Apriori algorithm, association rules and their implementation
Making recomendation in machine learning based on previous user-product ratings (Netflix-like recoomendation)
Visualization of matrix sparsity
Finding similar users or similar products to make recommendation
Using regression technique to make recommendation
Using ensemble learning to make recommendation
Basket analysis for non-numeric data
Apriori algorithm, association rules and their implementation
Nonparametric Methods (pre-recorded video)

Nonparametric density estimation: Histogram Estimator
Nonparametric density estimation: Kernel Estimator
Nonparametric density estimation: k-Nearest Neighbor Estimator
Nonparametric density estimation: Generalization to Multivarate Data
Condensed Nearest Neighbor
Distance-Based Classification
Nonparametric Regression: Smoothing Models
Nonparametric density estimation: Histogram Estimator
Nonparametric density estimation: Kernel Estimator
Nonparametric density estimation: k-Nearest Neighbor Estimator
Nonparametric density estimation: Generalization to Multivarate Data
Condensed Nearest Neighbor
Distance-Based Classification
Nonparametric Regression: Smoothing Models
Decision Trees (pre-recorded video)

Univariate Trees
Prunning on Decision Trees
Rule Extraction from Decision Trees
Learning Rules from Decision Trees
Multivariate Decision Trees
Univariate Trees
Prunning on Decision Trees
Rule Extraction from Decision Trees
Learning Rules from Decision Trees
Multivariate Decision Trees
Sentiment Analysis on Tweeter-like data (online lecture with Python and scikit-learn code in Jupyter notebook)

Learn about Naive Bayes classifier (NBC)
Apply NBC on tweets to do sentiment analysis
Learn various smoothing techniques in "NBC": (a) Add-one smoothing and, (b) Lidstone smoothing
Learn various performance metrics such as (a) true positive, (b) false positive, (c) false negative and (d) true negative in the confusion matrix.
Extend the performance metircs to: (a) accuracy, (b) error rate, (c) recall, (d) specificity, (e) precision, (f) false positive rate, (g) matthews correlation coefficient, (h) F-score
Basic working principle of Precision-Recall Curve (PRC)
Cleaning the tweets' texts can improve accurac
Use `part-of-speech' (POS) and substitution to refine the classification process
Learn how to use Pipeline mode of data analysis
Learn how to use Grid-search approach to find optimal values in hyper-parameters
Good video in explaining Area under the Curve (AUC) and Receiver Operator Characteristics (RoC)
Learn about Naive Bayes classifier (NBC)
Apply NBC on tweets to do sentiment analysis
Learn various smoothing techniques in "NBC": (a) Add-one smoothing and, (b) Lidstone smoothing
Learn various performance metrics such as (a) true positive, (b) false positive, (c) false negative and (d) true negative in the confusion matrix.
Extend the performance metircs to: (a) accuracy, (b) error rate, (c) recall, (d) specificity, (e) precision, (f) false positive rate, (g) matthews correlation coefficient, (h) F-score
Basic working principle of Precision-Recall Curve (PRC)
Cleaning the tweets' texts can improve accurac
Use `part-of-speech' (POS) and substitution to refine the classification process
Learn how to use Pipeline mode of data analysis
Learn how to use Grid-search approach to find optimal values in hyper-parameters
Good video in explaining Area under the Curve (AUC) and Receiver Operator Characteristics (RoC)
Kernel Machines (pre-recorded video)

Quick Review of Logistic Classification/Regression
From Logistic Classification to SVM Classification
Concept of Large Margin
Landmarks to Kernels
Theory of Margin and Support Vectors
Non-separable Case: Soft Margin Hyperplane
Hinge Loss
Kernel Tricks and Kernel Functions
Multiple Kernel Learning and Multiclass Kernel Machines
SVM for Regression
SVM for Ranking
Large Margin Nearest Neighbor
Kernel Dimensionality Reduction
Optional Reading 1: Constrained Optimization
Optional Reading 2: Inequality Constraints and Kuhn-Tucker method
Quick Review of Logistic Classification/Regression
From Logistic Classification to SVM Classification
Concept of Large Margin
Landmarks to Kernels
Theory of Margin and Support Vectors
Non-separable Case: Soft Margin Hyperplane
Hinge Loss
Kernel Tricks and Kernel Functions
Multiple Kernel Learning and Multiclass Kernel Machines
SVM for Regression
SVM for Ranking
Large Margin Nearest Neighbor
Kernel Dimensionality Reduction
Optional Reading 1: Constrained Optimization
Optional Reading 2: Inequality Constraints and Kuhn-Tucker method
Multilayers Perceptrons (Artificial Neural Networks) (pre-recorded video)

Perceptron
Training a Perceptron
Learning Boolean Functions
Multilayer Perceptrons
Backpropagation Algorithm
Training Procedures
Tuning the Network Size
Bayesian View of Learning
Dimensionality Reduction
Deep Learning
Perceptron
Training a Perceptron
Learning Boolean Functions
Multilayer Perceptrons
Backpropagation Algorithm
Training Procedures
Tuning the Network Size
Bayesian View of Learning
Dimensionality Reduction
Deep Learning
Topic Modeling: Comparing or searching documents by topics instead of words (online lecture with Python and scikit-learn code in Jupyter notebook)

Learn about the importance of topic modeling and how to search document within a topic.
Learn (at the high level) about latent Dirichlet allocation (LDA)
Learn about gensim package and how to generate topics for corpuses
Learn about visualizing topic distribution and how to use $\alpha$ to vary the distribuiton on associating document to number of topics
Learn about wordcloud package to visualize the words within a topic
Learn about how to find closest topics or documents.
Learn about the importance of topic modeling and how to search document within a topic.
Learn (at the high level) about latent Dirichlet allocation (LDA)
Learn about gensim package and how to generate topics for corpuses
Learn about visualizing topic distribution and how to use $\alpha$ to vary the distribuiton on associating document to number of topics
Learn about wordcloud package to visualize the words within a topic
Learn about how to find closest topics or documents.
Music Genre Classifiation (with Python and scikit-learn code in Jupyter notebook (will be uploaded to blackboard))

How to do music genre classification
How to use fast fourier transform (FFT) to convert songs into a vector of numbers, then use these vectors to train our learner
We usee Mel-frequency cepstral coefficients (MFCCs) to convert songs into a vector of numbers, then use these vectors to train our learner
We learn about the phyical meaning of precision/reall curve and ROC curve
We learn how to examine and visualize the confusion matrix
How to do music genre classification
How to use fast fourier transform (FFT) to convert songs into a vector of numbers, then use these vectors to train our learner
We usee Mel-frequency cepstral coefficients (MFCCs) to convert songs into a vector of numbers, then use these vectors to train our learner
We learn about the phyical meaning of precision/reall curve and ROC curve
We learn how to examine and visualize the confusion matrix
Graphical Models (To be uploaded if time allows)

Conditional Independence
Generative Models
d-Separation
Belief Propagation
Undirected Graphs and Markov Random Fields
Learning Structures from Graphical Model
Influence Diagram
Conditional Independence
Generative Models
d-Separation
Belief Propagation
Undirected Graphs and Markov Random Fields
Learning Structures from Graphical Model
Influence Diagram
Hidden Markov Models (To be uploaded if time allows)

Discrete Markov Processes
Hidden Markov Models (HMM)
Basic Problems of HMM
Evaluation Problem
Learning the State Sequence
Learning the Model Parameters
The HMM as a Graphical Model
Discrete Markov Processes
Hidden Markov Models (HMM)
Basic Problems of HMM
Evaluation Problem
Learning the State Sequence
Learning the Model Parameters
The HMM as a Graphical Model
Bayesian Estimation (To be uploaded if time allows)

Bayesian Estimation of Parameters of a Disrete Distribution
Bayesian Estimation of Parameters of a Gaussian Distribution
Bayesian Estimation of Parameters of a Function
Choosing a Prior
Bayesian Model Comparison
Bayesian Estimation of a Mixed Model
Gaussian and Dirichlet Processes, Chinese Restaurants
Latent Dirichlet Allocation
Beta Processes and Indian Buffets
Bayesian Estimation of Parameters of a Disrete Distribution
Bayesian Estimation of Parameters of a Gaussian Distribution
Bayesian Estimation of Parameters of a Function
Choosing a Prior
Bayesian Model Comparison
Bayesian Estimation of a Mixed Model
Gaussian and Dirichlet Processes, Chinese Restaurants
Latent Dirichlet Allocation
Beta Processes and Indian Buffets
Reinforcement Learning (e.g., Game Theory, Markov Decision Process,..etc.) (To be uploaded if time allows)

Single State Case: K-Armed Bandit
Elements of Reinforcement Learning
Model-Based Learning
Temporal Difference Learning
Partially Observed States
Brief Introduction to Game Theory
Single State Case: K-Armed Bandit
Elements of Reinforcement Learning
Model-Based Learning
Temporal Difference Learning
Partially Observed States
Brief Introduction to Game Theory

Additional References

Exploring Python by Timothy A. Budd
Think Python: How to Think Like a Computer Scientist, by Allen B. Downey
Python Tutorial
Python Programming at Youtube
Reference note on matrix differentiation
Matrix notations and operations
Vector notations and operations
The Matrix Cookbook by K.B. Petersen and M.S. Pedersen
Brief Introduction to Kalman Filters

Tutorial Notes (Availablle on Blackboard)

Tutorial 0: Introduction to Python,
Tutorial 1 (Quick Introduction to scikit-learn with Jupyter notebook);
Tutorial 2 (Review on Linear Algebra And Matrix Calculus, with Jupyter notebook))
Tutorial 3 (Review on Gradient Descent For Linear Regression with Jupyter notebook)
Tutorial 4 (Review on Linear Regression)
Tutorial 5 (Regularization and Cross Validation with Python code)
Tutorial 6 (Parametric Classification and Implementation with sample code)
Tutorial 7 (Principal Component Analysis)
Project Tutorial (Horse Racing Prediction)
Tutorial 8 (Kernel Machines) (To be uploaded)
Tutorial 9 (Ensemble Methods) (To be uploaded)

Homework (Available on Blackboard)

Will be posted on Blackboard.

Programming homework

Will be posted on Blackboard.

Programming Project :
Will be posted on Blackboard.