Machine Learning: Algorithms and Applications
CSCI 370, Spring 2022
Google classroom: qqkwyk5
The course introduces students to the field of Machine Learning: a subset of Artificial Intelligence which studies learning algorithms. These algorithms make it possible for a machine to learn on its own without being explicitly instructed.
The course includes application of Machine Learning algorithms to real-life tasks in a series of Data Science labs. This exploratory activity culminates in an open-ended student project inspired by student interests.
Table of contents |
---|
1. Course Info
1.1. Contacts |
|
---|---|
Instructor: | Marina Barsky |
Lecture hours: | Mon and Thu, 1:10 - 2:25 PM, Schow Library 030A |
Office hours: | Mon 4:30 - 6:00 PM and Wed 1:30 - 3:30 PM, TCL 209 |
e-mail: | [email protected] |
1.2. Textbook
Course Reading Packet. Link.
1.3. Deliverables |
||
---|---|---|
Quizzes | 15% | |
Labs | 25% | |
Mini-projects | 30% | |
Final project: | 30% |
2. Lectures
- Introduction.
What is Machine Learning? Why study Machine Learning? Types of Machine Learning tasks.
Slides 00.
New type of algorithms: stochastic optimizations.
Slides 01.
Readings: RP* pp.240-270.
Optimization algorithms DEMO. - Decision Trees and Classification Rules.
Decision Tree induction. Information and Entropy. GINI score.
Slides 02.
Dealing with multi-valued attributes, numeric attributes, and missing values.
Classification and Regression trees.
Slides 03.
Classification Rules. Coverage and accuracy.
Slides 04.
Readings: RP* pp. 3-31, pp. 199-215.
Decision tree DEMO. - Nearest Neighbors. Memory-based reasoning. Classification and prediction with k-NN.
Slides 05.
Proximity metrics: distance and similarity.
Slides 06.
Improving performance of K-NN classifier.
Slides 07.
Recommender systems.
Slides 08.
Readings: RP* pp. 215-219.
K-NN algorithm DEMO. - Clustering.
Introduction to cluster analysis. K-means.
Slides 09.
Agglomerative Hierarchical clustering.
Slides 10.
Density-Based Spatial Clustering (DBSCAN).
Slides 11.
Readings: RP* pp. 129-199.
Clustering words and documents DEMO. - Association Analysis.
Association rules. Support and Confidence.
Discovering frequent itemsets. Rule generation.
Slides 12.
Interestingness metrics. Null-invariant measures for large datasets.
Dealing with different levels of generalization. Simpson's paradox.
Slides 13.
Readings: RP* pp. 65-128.
Market basket DEMO. - Regression vs. Logistic Regression.
Numeric prediction. Linear Regression. Method of Least Squares.
Slides 14. Readings: Notes.
Iterative learning with gradient descent. Slides 15.
Video explanation.
Linear and Polynomial Regression. Home price prediction
DEMO.
Oversimplified and overcomplicated models. Overfitting. Bias-Variance tradeoff. Regularization. Slides 16. Generalization DEMO. Readings: Book chapter.
Logistic Regression. Mapping numeric predictions to binary labels using sigmoid function. Decision boundaries. Classification with Logistic Regression: interactive lecture.
Overview of Support Vector Machines. Video explanation.
Image classification DEMO. - Bayesian Classifiers.
Probability primer.
Bayesian reasoning. Naive Bayes.
Slides 17.
Dealing with missing values and numeric attributes. Laplace correction.
Slides 18.
Bayesian Belief Networks.
Slides 19.
Readings: RP* pp. 219-238. - Evaluating and Comparing Classifiers.
Predicting error rate. Cross-validation. Bootstrap. Comparing classifiers.
Slides 20.
Precision and recall. Cost-based evaluation.
Slides 21.
ROC curves. Slides 22.
Readings: RP* pp. 31-64. - Artificial Neural Networks. Introduction to Neural Networks.
Perceptron. Multi-layer Perceptron.
Linear boundaries and importance of non-linearity. Slides 23.
Perceptron Demo.
Convolutional Neural Networks. Image recognition. Slides 24.
Image recognition Demo.
Readings: RP* pp. 299-355.
*RP refers to the Course Reading Packet.
3. Labs
- Lab 0. Setup: Introduction to Jupyter notebooks. Titanic. Pandas and numpy. Link.
- Lab 1. Stochastic Optimizations: Optimizing student-to-dorm assignments. Link,
- Lab 2. Decision Trees: Predicting course evaluation scores. Link.
- Lab 3. Nearest Neighbors: Home price prediction/classification. Link.
- Lab 4. Clustering: Finding optimal store locations. Link.
- Lab 5. Naive Bayes: Classifying movie reviews. Link.
4. Mini-projects
4. Final Project
Wide open: what do you want to learn from data?