One minute
How does K-means clustering work?
K-means clustering (not to be confused with K-nearest neighbors) is an unsupervised learning algorithm used for grouping similar points together into clusters.
Algorithm
The basic K-means algorithm is fairly simple and has two steps, repeated until convergence:
- assign points to cluster corresponding to closest centroid
- update centroid locations to the mean of all points assigned to the associated cluster
The algorithm converges when the centroids stop moving, i.e. no points can be switched between clusters to a closer centroid.
def kmeans(points, k):
random.shuffle(points)
centroids = points[:k]
done = False
while not done:
clusters = clear_clusters()
# assign each point to closest cluster
for point in points:
cluster_idx = closest_cluster(centroids, point)
clusters[cluster_idx].append(point)
# recalculate centroids
new_centroids = update_centroids(clusters)
done = new_centroids == centroids
return clusters
Read other posts