Introduction to Machine Learning (NYU Paris, Spring 2022)

Machine Learning is getting more and more important these days with applications ranging from autonomous driving to computer assisted medicine, including weather or financial forecasting. In this class we will study the mathematical foundations of the current machine learning algorithms.

We will cover the main models from both supervised learning including linear and non linear regression and classification (kernel methods, support vector machine, neural networks) and unsupervised learning (including clustering, gaussian mixtures, self organizing maps, principal and independent component analysis and non linear dimensionality reduction)

We will review basic concepts in probability and statistics. We will discuss Bayesian vs frequentist statistics and model/parameter inference, as well as sampling methods.

Finally, we will also discuss the important question of model assessment and selection.

The class will follow the structure

1. Lectures (introduction of the new material that will be needed during the lab sessions and for the assignements)

2. Programming (lab) sessions, (you have the opportunity to apply what you have learned during the lecture, and you can ask all the questions you want to make sure you understand everything before the assignement)

3. Assignments (You are given a new problem and you are evaluated on your ability to use the course material to solve this new problem)

Schedule and Classroom

Lecture: Monday/Wednesday, 2.30pm – 3.45pm (Paris Time), Room 410
Recitations: Wednesday (C03) 4.00pm – 5.30pm (Paris time) . Room 410
Office hour: Tuesday 5.00pm – 6.00pm (Paris time)

Assignments policy

Except if explicitely stated otherwise, assignments are due at the beginning of each class.

Current (temporary) version of the notes: Lecture notes as well as the list of sections for the Final

Practice (theory) Questions for each exam can be found by clicking on those exams below

Exam : 60% of the grade (30% Midterm (Material), 30% Final(Material))

Exams: Enter password: Midterm

Enter password: Final Group 1

Enter password: Final Group 2

Enter password: Final Group 3

Enter password: Retake

Enter password: Final Retake

Assignments : 30 % of the grade (Tentative schedule below)

Final Project : 10 % of the grade (Tentative schedule below, List of suggestions, Poster guidelines)

The Github page for the class will be hosted at https://github.com/acosse/Introduction2MLSpring2022 and will be used for the lab and the assignments. You can also click on each “Lab” in the schedule below which will display a rendering of the notebooks through nbviewer. To access the file itself (and to be able to download it), you should go directly to github

Tentative schedule:

Legend: Lab sessions are in green, Homeworks and handwritten notes are in red (right side of the table), dates related to the project are in orange.

Week #	date	Topic	Assignements
Week 1	01/26	General Intro + reminders on proba and inference. Part I,
		Part I : supervised Learning
Week 2	01/31, 02/02	Linear and logistic regression, regularization and Compressed sensing Linear Classification Part I, Part II, Note on the Bias-Variance trade-off Demo Gradient Descent, Additional Note Ridge vs LASSO Handwritten Notes : Linear Regression, Regularization, Bias Variance Tradeoff
Week 3	02/07, 02/09	Linear and logistic regression, Linear Classification (Part II) Lab 1 Solutions, Lab 2 Solutions, Handwritten Notes : Intro class + Logistic Regr./Perceptron Handwritten Notes : GDA	Assignment 1
Week 4	02/14, 02/16	Non Linear classification, Kernel methods, SVM, Parts I & 2 Lab 3, Solutions, Handwritten Notes : Kernels/SVM
Week 5	02/21, 02/23	Neural Networks, Optimization, Stochastic Optimization, Deep learning, Part I Lab 4 (Part I) / Solutions (Part I), Solutions (tmp, Part II)	Assig. 1 due
Week 6	02/28, 03/02	Lab 2: Non Linear regression and classification, Neural Nets, Handwritten notes: Neural Nets Lab 5	Assign. 2, Project choice MidTerm Revisions
		Part II : Unsupervised Learning
Week 7	03/07, 03/09	Clustering, Linear Latent variable models Slides Lab 6, (partial) Solutions	Readings
Week 8	03/14, 03/16	Linear Latent variable models (Part II), PCA, ICA, GMM, EM algorithm, Non linear LVM, Part I Part II, Additional Note on MVN Demos FA/PCA , Handwritten Notes LVM (Part I), Handwritten Notes LVM (Part II) Lab 7, (partial) Solutions
Week 9	03/21, 03/23	Non Linear LVM and Manifold Learning Parts 1&2	Readings
Week 10	03/28, 03/30	Lab 3: Unsupervised Learning	Readings Assign. 3
Week 11	04/04, 04/06	Generalization, complexity and VC Theory
Week 12	04/11, 04/13	Probabilistic models, HMM, Bayesian Nets
Week 13	04/18, 04/20	Advanced topics, Reinforcement Learning, Adversarial Learning, Slides RL Lab RL
Week 14	04/25, 04/27	Revisions
Week 15	05/02, 05/04	Project Presentations
Week 16	05/09, 05/11	Final Exams
Week 17	05/16, 05/18	Final Exams

The elements of Statistical Learning, Hastie, Tibshirani, Friedman

Pattern Recognition and Machine Learning, Bishop

Machine Learning, a probabilistic perspective, Murphy

Non linear dimensionality reduction, Lee, Verleysen.

Variational Inference: A review for statisticians

An introduction to Probabilistic Graphical Models, M. I. Jordan

Lab Sessions and programming policy

The lab sessions will require you to do some programming. It is strongly recommended to use python as it is more flexible and will be useful to you when moving to pytorch later on for more advanced machine learning methods requiring GPU processing.

Downloading and getting started with Python.

Start by downloading anaconda: https://www.anaconda.com/download/#macos
If you don’t have a text editor yet, you can download sublime text (see interesting keyboard shortcuts here)

Data sets can be downloaded on the following websites:

- UCI Machine Learning Repository
- ENS Challenge Data