fbpixel
101 Concepts for the Level I Exam

Essential Concept 9: Unsupervised Machine Learning Algorithms


Principal components analysis (PCA) is used to reduce highly correlated features into a few uncorrelated composite variables. A composite variable is a variable that combines two or more variables that are statistically strongly related to each other.

K-means algorithm repeatedly partitions observations into k non-overlapping clusters. The number of clusters k, is a hyperparameter whose value must be set by the researcher before learning begins. Each cluster is characterized by its centroid and each observation is assigned to the cluster with the centroid to which that observation is closest.

Hierarchical clustering algorithms create intermediate rounds of clusters in increasing or decreasing size until a final clustering is reached. Agglomerative clustering (or bottom-up) hierarchical clustering begins with each observation being treated as its own cluster. Divisive clustering (or top-down) hierarchical clustering starts with all the observations belonging to a single cluster.