My PhD advisor recently introduced me to the solution of principle component analysis (PCA) that uses expectation maximization (EM), which led me to this paper that introduced this formulation. Because of how intuitive the algorithm is, I’d like to briefly explain it here.

The paper formulates PCA as a factor model:

The factors x are linearly transformed via C and have some additional Gaussian noise v.

The EM formulation falls out from using x as the latent factors. Given some initial estimate and , we can compute the posterior probabilities of x. With these , we can find and . And so we can do this iteratively, finding x given our best estimates of C and R, and then finding C and R given our updated x, and so on and so forth, until convergence.

This model becomes classical PCA in the zero noise limit. i.e.

and we only need to care for solving C. Because we are in the zero noise limit, given C we know exactly what x must be. In other words, the posterior probability of x is a point mass. This greatly simplifies the E step of the EM algorithm.

In fact, in the E step, we obtain latent variables . This intuitively corresponds to projecting the data onto our best guess of the principle component projection.

In the M step, we now update C:

This formula might look familiar as the closed form solution for ordinary least squares. In fact this is precisely that. We find a new projection that minimizes the squared distance between y and x.

The whole process is akin to holding a wooden rod in place and affixing some 0-length springs from point objects to the rod, and then letting the rod go free to rotate according to the forces of the springs. Once the rod is stabilized, we reattach the springs in new locations.

*Just as PCA finds the projection that maximizes the explained variance and minimizes total squared distance, we ultimately find the rod configuration that minimizes the total potential energy stored in the springs. *It is super elegant that by using EM, we find this connection from the generative model of PCA as a factor model to the traditional formulation of PCA as finding min/max problem of the squared distance/explained variance.

I have a diagrammatic view of one iteration of EM, where the circles represent data points and the squiggly lines represent springs.

Start：

Estep：

Mstep:

End:

Credits to Lior Pachter for introducing me to this concept, Valentine Svensson for feedback on this post, and Sam Roweis for coming up with the EM formulation for PCA.

Post script: after reading the paper that first introduced the EM solution to PCA, I noticed that it was written by a fellow Caltech graduate student in the 90’s, and upon googling ‘Sam Roweis’ I was very saddened to learn of his death in 2010 and of this great loss to science and mathematics.