Back to my homepage
MATH 285: Selected Topics in High Dimensional Data Modeling
Fall 2015, San Jose State UniversityCourse description
This is an advanced topics course in machine learning with big data [syllabus]. Topics to be covered include:- Singular value decomposition (SVD)
- Dimensionality Reduction
- Spectral Clustering
- Subspace Clustering
- Compressive Sensing
- Dictionary Learning
Useful textbooks
Some chapters of the following books have overlap with the material taught in this course:- The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd edition, by Hastie, Tibshirani, and Friedman, Springer
- Foundations of Data Science, free online book by Hopcroft and Kannan.
Homework
- HW1: [Assignment] [Supplemental data] [Solution]
- HW2: [Assignment] [Supplemental files] [Sample solution 1] [Sample solution 2]
- HW3: [Assignment] [Supplemental data] [Sample solution 1] [Sample solution 2]
- HW4: [Assignment] [Supplemental data] [Sample solution]
Course project
This course ends with a project that should be reported in the form of an oral presenation in class and/or a report (see here for instructions).Learning resources
MATLAB resources
- MATLAB trial version (good for one month)
- Here is one tutorial; tons of others can be found here
- Common Matlab commands
- Scripts used in class
Suggested papers
Principal Component Analysis (PCA)
- A very thorough but accessible tutorial;
- A handout by instructor
Multidimensional Scaling (MDS)
- A book chapter on MDS
Isometric Feature Map (ISOmap)
- ISomap homepage maintained by authors (with paper, code, and data)
- For more nonlinear dimensionality reduction techniques, see an overview and a longer paper
Kernel Principal Componenet Analysis (Kernel PCA)
- This is a relatively easy-to-read paper on Kernel PCA (you can ignore the sections about active shape models)
- Here is a nice blog that tries to explain Kernel PCA with the Gaussian kernel (also called RBF kernel)
- Read this paper for mathematical derivation of Kernel PCA; the longer version of the paper is available at this link
Clustering basics and kmeans clustering
See below for two excellent lectures: How to initialize kmeans:- kmeans++ [slides] [paper]. It has been implemented in Matlab 2014b as the default.
- kmeans// (parallelized kmeans++ for large data sets) [paper]
Spectral clustering
- A (long) tutorial on spectral clustering [paper]
- Normalized cuts and image segmentation [paper] [software]
- On spectral clustering: analysis and an algorithm [paper]
- Self-tuning spectral clustering [paper] [webpage]
Subspace clustering
- Review paper on subspace clustering in IEEE Signal Processing Magazine (March 2011)
- Spectral Curvature Clustering (SCC) [long talk] [short talk] [applied paper] [theoretical paper] [software]
- Multiscale Analysis of Plane Arrangements (MAPA) [paper] [software]
- Sparse Subspace Clustering (SSC) [paper] [webpage] [code] and Low-rank Representation (LRR) [paper] [software]
- Generalized PCA (GPCA) [webpage]
Dictionary learning
- Lecture notes on dictionary learning (start with page 16)
- Colloquium talk at SJSU (focus on first half)
- K-SVD [paper] [talk] [software]
- Sparse coding [paper 1] [paper 2] [OMP] [CVX]. Here is an introduction to convex optimization [slides].
- Application to image processing [paper]
Data sets
- UCI Machine Learning Repository: 336 data sets in total
- MNIST Handwritten Digits: all digits, only digit 1
- Extended Yale Face Database B: full data set, a subset used in class
- Data used by ISOmap
- Hopkins 155 database
- Oxford Flowers Category Datasets
Useful course websites
- Stanford University Stats 306B: Methods for Applied Statistics: Unsupervised Learning
- University of Waterloo Data Science Course Offerings
- University of Western Ontario CS 434s/541a Pattern Recognition
- University of Washington CSS 581 - Introduction to Machine Learning
- RPI CSCI 4966 & 6967 Foundations of Data Science by P. Drineas
- Oxford University Machine Learning Lectures by A. Zisserman
- Stanford University CS 229 Machine Learning Course by A. Ng
- Oregon State University CS 534: Machine Learning