A Unifying Viewpoint of some Clustering Techniques Using
Bregman Divergences and Extensions to Mixed Data Sets

Cécile Levasseur, Ken Kreutz-Delgado, Brandon Burge, and Uwe F. Mayer

Abstract: We present a general viewpoint using Bregman divergences and exponential family properties that contains as special cases the three following algorithms: 1) exponential family Principal Component Analysis (exponential PCA), 2) Semi-Parametric exponential family Principal Component Analysis (SP-PCA) and 3) Bregman soft clustering. This framework is equivalent to a mixed data-type hierarchical Bayes graphical model assumption with latent variables constrained to a low-dimensional parameter subspace. We show that within this framework exponential PCA and SPPCA are similar to the Bregman soft clustering technique with the addition of a linear constraint in the parameter space. We implement the resulting modifications to SP-PCA and Bregman soft clustering for mixed (continuous and/or discrete) data sets, and add a nonparametric estimation of the point-mass probabilities to exponential PCA. Finally, we compare the relative performances of the three algorithms in a clustering setting for mixed data sets.

Key words: Generalized linear models, exponential family distributions, principal components, dimensionality reduction.


You can download a copy of this paper (about 8 pages).

Mayer21.pdf This file is in Portable Document Format. (259 Kbytes)


[leftarrow]Back

mayer@math.utah.edu
Mon Sep 22 17:19:15 MDT 2008
Last updated: Wed Oct 22 19:25:02 MDT 2008