Stochastic Seminar, University of Utah, Spring 2020

This talk addresses two topics related to robust statistical procedures for clustering in high dimensions: path-based spectral clustering (a new method) and classical multidimensional scaling (an old method). Path-based spectral clustering is a novel approach which combines a data driven metric with graph-based clustering. Using a data driven metric allows for fast algorithms and strong theoretical guarantees when clusters concentrate around low-dimensional sets. Another approach to high-dimensional clustering is classical multidimensional scaling (CMDS), a dimension reduction technique widely popular across disciplines due to its simplicity and generality. CMDS followed by a simple clustering algorithm can exactly recover all cluster labels with high probability when the signal to noise ratio is high enough. However, scaling conditions become increasingly restrictive as the ambient dimension increases, illustrating the need for robust unbiasing procedures in high dimensions.

Motivated by applications such as cryo-electron microscopy, this talk discusses a generalization of the 1-dimensional multi-reference alignment problem. The goal is to recover a hidden signal from many noisy observations of the hidden signal, where each noisy observation includes a random translation, a random dilation, and high additive noise. We propose using a translation-invariant, wavelet-based representation, and analyze the statistical properties of this representation given a large number of independent corruptions of the hidden signal. The wavelet-based representation uniquely defines the power spectrum and is used to apply a data-driven, nonlinear unbiasing procedure, so that the estimate of the hidden signal is robust to high frequency perturbations. After unbiasing the representation, the power spectrum is recovered by solving a convex optimization problem, and thus the target signal is obtained up to an unknown phase. Extensive numerical experiments demonstrate the statistical robustness of this approximation procedure.

Spectral geometry has played an important role in modern geometric data analysis, where the technique is widely known as Laplacian eigenmaps or diffusion maps. In this talk, we present a geometric framework that studies graph representations of complex datasets, where each edge of the graph is equipped with a non-scalar transformation or correspondence. This new framework models such a dataset as a fibre bundle with a connection, and interprets the collection of pairwise functional relations as defining a horizontal diffusion process on the bundle driven by its projection on the base. The eigenstates of this horizontal diffusion process encode the “consistency” among objects in the dataset, and provide a lens through which the geometry of the dataset can be revealed. We demonstrate an application of this geometric framework on evolutionary anthropology.

Fibre bundles serve as a natural geometric setting for many learning problems involving non-scalar pairwise interactions. Modeled on a fixed principal bundle, different irreducible representations of the structural group induce many associated vector bundles, encoding rich geometric information for the fibre bundle as well as the underlying base manifold. An intuitive example for such a learning paradigm is phase synchronization---the problem of recovering angles from noisy pairwise relative phase measurements---which is prototypical for a large class of imaging problems in cryogenic electron microscopy (cryo-EM) image analysis. We propose a novel nonconvex optimization formulation for this problem, and develop a simple yet efficient two-stage algorithm that, for the first time, guarantees strong recovery for the phases with high probability. We demonstrate applications of this multi-representation methodology that improve denoising and clustering results for cryo-EM images. This algorithmic framework also extends naturally to general synchronization problems over other compact Lie groups, with a wide spectrum of potential applications.

We establish estimates for the coalescence time of semi-infinite directed geodesics in the planar corner growth model with i.i.d. exponential weights. There are four estimates: upper and lower bounds for both fast and slow coalescence on the correct scale with exponent 3/2. The lower bound for fast coalescence is new and has the optimal exponential order of magnitude. For the other three, we provide proofs that do not rely on integrable probability or on the connection with the totally asymmetric simple exclusion process, in order to provide a template for the extension to other models. We utilize a geodesic duality introduced by Pimentel and properties of the increment-stationary last-passage percolation process. Joint work with Timo Seppäläinen.

In this talk, I will discuss two different 'ranking' models: Mallows' ranking model and rank-dependent diffusions. In the first part, I will discuss the rank-dependent diffusions. I will focus on two models: Up the River model, and N-player games with fuel constraints. These problems require treating carefully the corresponding PDEs. The former is joint with Li-Cheng Tsai, and the latter joint with Xin Guo and Renyuan Xu.
In the second part, I will focus on the Mallows' permutation, and various generalizations. In particular, I will talk about a general model, called regenerative permutations. I will also discuss the statistical properties and algorithms of these Mallows' type ranking models. This is partly joint with Jim Pitman. If time permits, I will discuss recent progress on the random walk derived from random permutations, which is motivated by applications in systems biology.

Sampling Gibbs measures at low temperature is a very important task but computationally very challenging. Numeric evidence suggest that the infinite-swapping algorithm (isa) is a promising method. The isa can be seen as an improvement of replica methods which are very popular. We rigorously analyze the ergodic properties of the isa in the low temperature regime deducing Eyring-Kramers formulas for the spectral gap (or Poincare constant) and the log-Sobolev constant. Our main result shows that the effective energy barrier can be reduced drastically using the isa compared to the classical over-damped Langevin dynamics. As a corollary we derive a deviation inequality showing that sampling is also improved by an exponential factor. Furthermore, we analyze simulated annealing for the isa and show that isa is again superior to the over-damped Langevin dynamics. This is joint work with Georg Menz, André Schlichting and Tianqi Wu.

Nowadays, optimal transport, i.e., Wasserstein metrics, play essential roles in data science. In this talk, we briefly review its development and applications in machine learning. In particular, we will focus its induced optimal control problems in density space and differential structures. We introduce the Wasserstein natural gradient in parametric models. The Wasserstein metric tensor in probability density space is pulled back to the one on parameter space. We derive the Wasserstein gradient flows and proximal operators in parameter space. We demonstrate that the Wasserstein natural gradient works efficiently in learning, with examples in Boltzmann machines, generative adversary networks (GANs), image classifications, and adversary robustness etc.

We present a systematic framework for the Nesterov's accelerated gradient flows in the spaces of probabilities embedded with information metrics. Here two metrics are considered, including both the Fisher-Rao metric and the Wasserstein-2 metric. For the Wasserstein-2 metric case, we prove the convergence properties of the accelerated gradient flows, and introduce their formulations in Gaussian families. Furthermore, we propose a practical discrete-time algorithm in particle implementations with an adaptive restart technique. We formulate a novel bandwidth selection method, which learns the Wasserstein-2 gradient direction from Brownian-motion samples. Experimental results including Bayesian inference show the strength of the current method compared with the state-of-the-art. Further connections with inverse problems and data related optimization techniques will be discussed.

High-dimensional dynamical data arise in many fields of modern science and introduce new challenges in statistical learning and data recovery. In this talk, I will present two sets of problems. One is related to the data-driven discovery of dynamics in systems of interacting agents: given only observed trajectories of the system, we are interested in estimating the interaction laws between the agents using tools from statistical/machine learning. We show that at least in particular circumstances, where the interactions are governed by (unknown) functions of distances, the high-dimensionality of the state space of the system does not affect the learning rates. We can achieve an optimal learning rate for the interaction kernel, equal to that of a one-dimensional regression problem. The other one is related to the dynamical sampling: a new area in sampling theory that deals with processing a linear time series of evolving signals and aims at recovering the initial state and the forward operator from its coarsely sampled evolving states. We provide mathematical theories to show how the dynamics can inform feasible space-time sampling locations and the fundamental limit of space-time trade-off.

Interacting agent systems are ubiquitous in science, from the modeling of particles in physics to prey-predator in Biology, to opinion dynamics in social sciences. A fundamental yet challenging problem in various areas of science is to infer the interaction rule that yields collective motions of agents. We consider an inverse problem: given abundantly observed trajectories of the system, we are interested in estimating the interaction laws between the agents. We show that at least in the particular setting where the interaction is governed by an (unknown) function of pairwise distances, under a suitable coercivity condition that guarantees the well-posedness of the problem of recovering the interaction kernel, statistically and computationally efficient, nonparametric, suitably-regularized least-squares estimators exist. Our estimators achieve the optimal learning rate for one-dimensional (the variable being pairwise distance) regression problems with noisy observations. We demonstrate the performance of our estimators on various simple examples. In particular, our estimators produced accurate predictions of collective dynamics in relative large time intervals, even when they are learned from data collected in short time intervals.

Thursday, February 20. Departmental Colloquium. 4:00-5:00 PM. JWB 335

Bao Wang, UCLA
" PDE-Principled Trustworthy Deep Learning Meets Computational Biology "

Friday, February 21. Joint Stochastics and Applied Math Seminar. 3-4 PM. LCB 225

Bao Wang, UCLA
" Momentum and PDE Techniques for Deep Learning "

In first-passage percolation, we place i.i.d. nonnegative weights (t_e) on the edges of a graph and consider the induced weighted graph metric T(x,y). When the underlying graph is the two-dimensional square lattice, there is a phase transition in the model depending on the probability p that an edge weight equals zero: for p<1/2, the metric T(0,x) grows linearly in x, whereas for p>1/2, it remains stochastically bounded. The critical case occurs for p=1/2, where there are large but finite clusters of zero-weight edges. In this talk, I will review work with Wai-Kit Lam and Xuan Wang in which we determine the rate of growth for T(0,x) up to a constant factor for all critical distributions. Then I will explain recent work with Jack Hanson and Wai-Kit Lam in which we determine the "time constant" (leading order constant in the rate of growth) in the special case where the graph is the triangular lattice, and the weights are placed on the vertices. This is the only class of distributions known where this time constant is computable: we find that it is an explicit function of the infimum of the support of t_e intersected with (0,\infty).

We discuss some recent results about the macroscopic behavior of the solution to the stochastic heat equation with Levy noise. For a fixed spatial point, we show that the solution develops unusually large peaks as time tends to infinity, a phenomenon which we refer to as additive intermittency because it already occurs under additive Levy noise. When we consider a fixed time point and let the spatial variable become large, we make a similar observation: the solution develops larger peaks than one would expect from moment considerations. Perhaps most interestingly, the largest size of these intermittency islands is the same, no matter whether we have additive or multiplicative Levy noise. This talk is based on joint work with Peter Kevei (University of Szeged).

Fridays, 3-4 PM, LCB 225