MATH 285: Selected Topics in High Dimensional Data Modeling

Fall 2015, San Jose State University

Course description

This is an advanced topics course in machine learning with big data [syllabus]. Topics to be covered include:

Singular value decomposition (SVD)
Dimensionality Reduction
Spectral Clustering
Subspace Clustering
Compressive Sensing
Dictionary Learning

and their applications to image processing. There is no required textbook; we will cover material from various sources (papers, websites, etc.).

Useful textbooks

Some chapters of the following books have overlap with the material taught in this course:

The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd edition, by Hastie, Tibshirani, and Friedman, Springer
Foundations of Data Science, free online book by Hopcroft and Kannan.

Homework

HW1: [Assignment] [Supplemental data] [Solution]
HW2: [Assignment] [Supplemental files] [Sample solution 1] [Sample solution 2]
HW3: [Assignment] [Supplemental data] [Sample solution 1] [Sample solution 2]
HW4: [Assignment] [Supplemental data] [Sample solution]

Course project

This course ends with a project that should be reported in the form of an oral presenation in class and/or a report (see here for instructions).

Learning resources

MATLAB resources

MATLAB trial version (good for one month)
Here is one tutorial; tons of others can be found here
Common Matlab commands
Scripts used in class

Data sets

UCI Machine Learning Repository: 336 data sets in total
MNIST Handwritten Digits: all digits, only digit 1
Extended Yale Face Database B: full data set, a subset used in class
Data used by ISOmap
Hopkins 155 database
Oxford Flowers Category Datasets

Useful course websites

Stanford University Stats 306B: Methods for Applied Statistics: Unsupervised Learning
University of Waterloo Data Science Course Offerings
University of Western Ontario CS 434s/541a Pattern Recognition
University of Washington CSS 581 - Introduction to Machine Learning
RPI CSCI 4966 & 6967 Foundations of Data Science by P. Drineas
Oxford University Machine Learning Lectures by A. Zisserman
Stanford University CS 229 Machine Learning Course by A. Ng
Oregon State University CS 534: Machine Learning

Instructor feedback

This is an experimental course in data science, being taught at SJSU for the first time. Your feedback (as early as possible) is encouraged and greatly appreciated, and will be seriously considered by the instructor for improving the course experience for both you and your classmates. Please submit your annonymous feedback through this page.

MATH 285: Selected Topics in High Dimensional Data Modeling

Course description

Useful textbooks

Homework

Course project

Learning resources

MATLAB resources

Suggested papers

Principal Component Analysis (PCA)

Multidimensional Scaling (MDS)

Isometric Feature Map (ISOmap)

Kernel Principal Componenet Analysis (Kernel PCA)

Clustering basics and kmeans clustering

Spectral clustering

Subspace clustering

Dictionary learning

Data sets

Useful course websites

Instructor feedback