Home
  • LECTURES
  • LABS
  • PROJECTS

Machine Learning: Algorithms and Applications


CSCI 370, Spring 2022

Google classroom: qqkwyk5

The course introduces students to the field of Machine Learning: a subset of Artificial Intelligence which studies learning algorithms. These algorithms make it possible for a machine to learn on its own without being explicitly instructed.

The course includes application of Machine Learning algorithms to real-life tasks in a series of Data Science labs. This exploratory activity culminates in an open-ended student project inspired by student interests.

Table of contents

  1. Course Info
    1. 1.1. Contacts
    2. 1.2. Textbook
    3. 1.3. Deliverables
  2. Lectures
  3. Labs
  4. Mini-projects
  5. Final project

1. Course Info


1.1. Contacts

Instructor:Marina Barsky
Lecture hours:Mon and Thu, 1:10 - 2:25 PM, Schow Library 030A
Office hours:Mon 4:30 - 6:00 PM and Wed 1:30 - 3:30 PM, TCL 209
e-mail:[email protected]


1.2. Textbook

Course Reading Packet. Link.


1.3. Deliverables

Quizzes15%
Labs25%
Mini-projects30%
Final project:30%

2. Lectures


  • Introduction. What is Machine Learning? Why study Machine Learning? Types of Machine Learning tasks. Slides 00. New type of algorithms: stochastic optimizations. Slides 01.
    Readings: RP* pp.240-270.
    Optimization algorithms DEMO.
  • Decision Trees and Classification Rules. Decision Tree induction. Information and Entropy. GINI score. Slides 02. Dealing with multi-valued attributes, numeric attributes, and missing values. Classification and Regression trees. Slides 03. Classification Rules. Coverage and accuracy. Slides 04.
    Readings: RP* pp. 3-31, pp. 199-215.
    Decision tree DEMO.
  • Nearest Neighbors. Memory-based reasoning. Classification and prediction with k-NN. Slides 05. Proximity metrics: distance and similarity. Slides 06. Improving performance of K-NN classifier. Slides 07. Recommender systems. Slides 08.
    Readings: RP* pp. 215-219.
    K-NN algorithm DEMO.
  • Clustering. Introduction to cluster analysis. K-means. Slides 09. Agglomerative Hierarchical clustering. Slides 10. Density-Based Spatial Clustering (DBSCAN). Slides 11.
    Readings: RP* pp. 129-199.
    Clustering words and documents DEMO.
  • Association Analysis. Association rules. Support and Confidence. Discovering frequent itemsets. Rule generation. Slides 12. Interestingness metrics. Null-invariant measures for large datasets. Dealing with different levels of generalization. Simpson's paradox. Slides 13.
    Readings: RP* pp. 65-128.
    Market basket DEMO.
  • Regression vs. Logistic Regression. Numeric prediction. Linear Regression. Method of Least Squares. Slides 14. Readings: Notes. Iterative learning with gradient descent. Slides 15. Video explanation. Linear and Polynomial Regression. Home price prediction DEMO.
    Oversimplified and overcomplicated models. Overfitting. Bias-Variance tradeoff. Regularization. Slides 16. Generalization DEMO. Readings: Book chapter.
    Logistic Regression. Mapping numeric predictions to binary labels using sigmoid function. Decision boundaries. Classification with Logistic Regression: interactive lecture.
    Overview of Support Vector Machines. Video explanation.
    Image classification DEMO.
  • Bayesian Classifiers. Probability primer. Bayesian reasoning. Naive Bayes. Slides 17. Dealing with missing values and numeric attributes. Laplace correction. Slides 18. Bayesian Belief Networks. Slides 19.
    Readings: RP* pp. 219-238.
  • Evaluating and Comparing Classifiers. Predicting error rate. Cross-validation. Bootstrap. Comparing classifiers. Slides 20. Precision and recall. Cost-based evaluation. Slides 21. ROC curves. Slides 22.
    Readings: RP* pp. 31-64.
  • Artificial Neural Networks. Introduction to Neural Networks. Perceptron. Multi-layer Perceptron. Linear boundaries and importance of non-linearity. Slides 23.
    Perceptron Demo.
    Convolutional Neural Networks. Image recognition. Slides 24.
    Image recognition Demo.
    Readings: RP* pp. 299-355.

*RP refers to the Course Reading Packet.

3. Labs

  • Lab 0. Setup: Introduction to Jupyter notebooks. Titanic. Pandas and numpy. Link.
  • Lab 1. Stochastic Optimizations: Optimizing student-to-dorm assignments. Link,
  • Lab 2. Decision Trees: Predicting course evaluation scores. Link.
  • Lab 3. Nearest Neighbors: Home price prediction/classification. Link.
  • Lab 4. Clustering: Finding optimal store locations. Link.
  • Lab 5. Naive Bayes: Classifying movie reviews. Link.

4. Mini-projects

  • I. Classification Rules: Predicting COVID outcomes. Link.
  • II. Clustering: Clustering countries by cultural dimensions. Link.
  • III. Business Project: Improving performance of mail campaign with classifiers. Link.

4. Final Project

Wide open: what do you want to learn from data?