Course Clustering:
Bard College at Simon's Rock
Method: Guided LDA (Latent Dirichlet Allocation)
Thanks to Vikash Singh's semi-supervised guided topic model.
Step 1: Find topic model
TF-IDF (Term Frequency-Inverse Document Frequency)
LDA Corpus (standard)
Guided LDA
Step 2: Normalize values (topic document numbers from model are extremely small floats)
Max-min method
Multiply by an exponent of 10 to change data range from 0 to 1 to integers
Step 3: K-means clustering
Pearson (distance measurement)
Best k: 5
Results: Documents and Class Clusters
Documents:
Mixed:
Class Clusters:
Distance Matrix
Classes
Analysis: