Only the results are presented. For source code please contact authors (@simons-rock.edu).
Protein Secondary Structure Prediction Using Recurrent Neural Network
and Long Short-Term Memory
by Joyee Wang: jwang17.
Web app: Protein Secondary Structure Predictor.
Creating a Pokémon Battle AI with Decision Trees
by Kai Dai: kdai17.
Video: battle of a pokemon vs. Decision Tree.
Clustering Financial Time Series. by Jeff Tsen: jtsen15.
Results.
Summarizing College Curriculum using Topic Modeling. by Betty Jia: bjia18.
Topics and course clusters at Simon's Rock.
Learning from data. Types of ML tasks. Slides.
Breaks data into groups along different attribute values and classifies each group separately. Discovers important features on top of the tree. Variation: Decision Table based on classification rules. Regression trees predict numeric targets.
Learning Decision Trees. Slides 01.01.
Slides 01.02.
Classification rules. Decision tables. Slides 01.03.
Decision tree code.
Book chapters on Decision trees and
on Classification rules.
Naive Bayes classifier evaluates conditional probability of each class given the observations.
Assumes statistical independence between features.
Bayesian Belief Network evaluates joint probability of all
variables given the observations.
Takes into account statistical dependence between features according to
the edges in the dependency graph and Markov blanket.
All features in the training set are known in advance and used to fill probability tables
at each node of the Bayesian network.
Conditional probabilities primer.
Naive Bayes classifiers.
Slides 02.01.
Slides 02.02.
Bayesian Belief Networks. Slides 02.03.
Article: An Application of Bayesian Networks to Antiterrorism Risk Management.
Link.
Classifying tweet sentiments using Naive Bayes. Code by J.Wang.
Positivity map.
Credibility:
Evaluating what's been learned. Holdout estimation.
Cross-validation. Bootstrap.
Predicting performance. Slides 03.01.
Cost-based evaluation. Slides 03.02.
Comparing classifiers. ROC curves.
Slides 03.03.
Improving mail promotion campaign.
Handout.
Instead of building predictive models from data, just remembers all the data and classifies new records using the Nearest Neighbor approach: finds most similar instances and issues their class as a prediction. Or finds similar instances and recommends their favorite items.
Nearest-neighbor classifier.
Slides 04.01.
Book chapter on
Making Recommendations.
Groups observation into clusters based on pairwise similarity between records. Each observation is assigned to a single cluster. In fuzzy-clustering each data point is assigned to several clusters.
K-means Clustering. Bisecting K-means. Limitations of K-means.
Slides 05.01.
Agglomerative Hierarchical Clustering. Density-based Clustering. DBSCAN.
Slides 05.02.
Evaluating cluster quality. Slides 05.03 .
Fuzzy clustering: Fuzzy C-means, Expectation Maximization, Topic Modeling.
Slides 05.04.
Book chapter on Clustering blogs.
Code for clustering algorithms.
Clustering countries by cultural dimensions: Experiment by Joyee Kim.
Discovers most important nodes from network topology using stochastic Markov processes.
Learning from graphs. Link analysis. PageRank Algorithm.
Slides 06.01.
Implementation of PageRank.
Finds the best model to fit data by simulating evolutionary pressure.
General optimization techniques: Hill climbing, Simulated annealing. Genetic Algorithm.
Slides 07.01.
Code for optimizations.
Optimizing flights and student-to-dorm assignments.
Genetic programming: algorithm which creates self-adjustable programs. Slides 07.02.
Code for Genetic Programming.
Book chapter on Evolving Inteligence.
Discovers groups of items which often appear together.
Association analysis. Basic concepts and algorithms.
Slides 08.01.
Interestingness measures. When statistical independence test fails. Concept hierarchies. Simpson's paradox.
Slides 08.02.
Book chapter on Association rules.
From a network of connected "neurons" learns the weights for each feature to classify a new vector.
Artificial neuron. Perceptron. Multi-layer Artificial Neural Networks. Backpropagation of errors.
Backpropagation algorithm. Slides 09.01.
Sample code for Perceptron,
and for the Multi-Layer Perceptron. Experiment with
Breast
Cancer Diagnosis.
Book chapter on deriving backpropagation.
Assigns observations to a discrete set of classes by transforming its output using the logistic sigmoid function and returns a probability which can be mapped to one of two binary classes.
Book chapter on Predicting Good Answers on StackOverflow.
Can be extended into an application which will output a score of a post with each new word (in real time).
In a similar way you can build an automatic essay grader: Graded Essays Dataset from Kaggle.
Book chapter on Music Genre Classification.
You can then create a live Music Genre Classification app by getting music samples
using this code.
Book chapter on Image classification. Large
dataset of labeled images.
Performs a linear mapping of the data to a lower-dimensional space in such a way that the variance of the data in the low-dimensional representation is maximized.
Book chapter on Dimensionality Reduction.
Image Compression with PCA
- can be used as a preprocessing step in image classification.
A type of artificial neural network that is trained using unsupervised learning to produce a low-dimensional, discretized representation of the input space. This makes SOMs useful both for classification and for visualization by creating low-dimensional views of high-dimensional data.
Book chapter on the algorithm and its implementation [see section 3].
Finds a hyperplane which separates points from different classes. Applies a kernel trick to separate non-linear separable data.
Book chapter explaining SVM algorithms and their implementation.