ML primitives from scratch

Fundamental algorithms like K-Means are just math loops on vectors.

The idea

Many machine learning algorithms don't require heavy frameworks; they are simple loops. For example, K-Means Clustering aims to group points into K clusters.

The algorithm is remarkably simple (Lloyd's algorithm): 1) Assign every point to the closest centroid using Euclidean distance. 2) Move each centroid to the exact average (mean) of all points assigned to it. 3) Repeat until the centroids stop moving.

2 clusters. Points are currently unassigned.

How it works (K-Means)

def kmeans(X, k, max_iters=10):
    centroids = X[:k] # Initialize randomly
    
    for _ in range(max_iters):
        # 1. Assign clusters
        clusters = [[] for _ in range(k)]
        for x in X:
            dists = [euclidean(x, c) for c in centroids]
            best_k = dists.index(min(dists))
            clusters[best_k].append(x)
            
        # 2. Update centroids
        new_centroids = []
        for cl in clusters:
            new_c = np.mean(cl, axis=0)
            new_centroids.append(new_c)
            
        if np.allclose(centroids, new_centroids):
            break # Stable!
        centroids = new_centroids
        
    return clusters