Anna Little, Michigan State University
" Robust Statistical Procedures for Clustering in High Dimensions "
This talk addresses two topics related to robust statistical procedures for clustering in high dimensions: path-based spectral clustering (a new method) and classical multidimensional scaling (an old method). Path-based spectral clustering is a novel approach which combines a data driven metric with graph-based clustering.
Using a data driven metric allows for fast algorithms and strong theoretical guarantees when clusters concentrate around low-dimensional sets. Another approach to high-dimensional clustering is classical multidimensional scaling (CMDS), a dimension reduction technique widely popular across disciplines due to its simplicity and generality. CMDS followed by a simple clustering algorithm can exactly recover all cluster labels with high probability when the signal to noise ratio is high enough. However, scaling conditions become increasingly restrictive as the ambient dimension increases, illustrating the need for robust unbiasing procedures in high dimensions.
Anna Little, Michiga State University
" Wavelet Invariants for Statistically Robust Multi-Reference Alignment "
Motivated by applications such as cryo-electron microscopy, this talk discusses a generalization of the 1-dimensional multi-reference alignment problem. The goal is to recover a hidden signal from many noisy observations of the hidden signal, where each noisy observation includes a random translation, a random dilation, and high additive noise. We propose using a translation-invariant, wavelet-based representation, and analyze the statistical properties of this representation given a large number of independent corruptions of the hidden signal. The wavelet-based representation uniquely defines the power spectrum and is used to apply a data-driven, nonlinear unbiasing procedure, so that the estimate of the hidden signal is robust to high frequency perturbations. After unbiasing the representation, the power spectrum is recovered by solving a convex optimization problem, and thus the target signal is obtained up to an unknown phase. Extensive numerical experiments demonstrate the statistical robustness of this approximation procedure.
Tingran Gao, University of Chicago
" Manifold learning on fibre bundles "
Spectral geometry has played an important role in modern geometric data analysis, where the technique is widely known as Laplacian eigenmaps or diffusion maps. In this talk, we present a geometric framework that studies graph representations of complex datasets, where each edge of the graph is equipped with a non-scalar transformation or correspondence. This new framework models such a dataset as a fibre bundle with a connection, and interprets the collection of pairwise functional relations as defining a horizontal diffusion process on the bundle driven by its projection on the base. The eigenstates of this horizontal diffusion process encode the “consistency” among objects in the dataset, and provide a lens through which the geometry of the dataset can be revealed. We demonstrate an application of this geometric framework on evolutionary anthropology.
Tingran Gao, University of Chicago
" Multi-Representation Manifold Learning on Fibre Bundles "
Fibre bundles serve as a natural geometric setting for many learning problems involving non-scalar pairwise interactions. Modeled on a fixed principal bundle, different irreducible representations of the structural group induce many associated vector bundles, encoding rich geometric information for the fibre bundle as well as the underlying base manifold. An intuitive example for such a learning paradigm is phase synchronization---the problem of recovering angles from noisy pairwise relative phase measurements---which is prototypical for a large class of imaging problems in cryogenic electron microscopy (cryo-EM) image analysis. We propose a novel nonconvex optimization formulation for this problem, and develop a simple yet efficient two-stage algorithm that, for the first time, guarantees strong recovery for the phases with high probability. We demonstrate applications of this multi-representation methodology that improve denoising and clustering results for cryo-EM images. This algorithmic framework also extends naturally to general synchronization problems over other compact Lie groups, with a wide spectrum of potential applications.
Xiao Shen, University of Wisconsin - Madison
" Coalescence estimates for the corner growth model with exponential weights "
We establish estimates for the coalescence time of semi-infinite directed geodesics in the planar corner growth model with i.i.d. exponential weights. There are four estimates: upper and lower bounds for both fast and slow coalescence on the correct scale with exponent 3/2. The lower bound for fast coalescence is new and has the optimal exponential order of magnitude. For the other three, we provide proofs that do not rely on integrable probability or on the connection with the totally asymmetric simple exclusion process, in order to provide a template for the extension to other models. We utilize a geodesic duality introduced by Pimentel and properties of the increment-stationary last-passage percolation process.
Joint work with Timo Seppäläinen.
Wenpin Tang, UCLA
" Discrete and continuous ranking models "
In this talk, I will discuss two different 'ranking' models:
Mallows' ranking model and rank-dependent diffusions. In the first part,
I will discuss the rank-dependent diffusions. I will focus on two models:
Up the River model, and N-player games with fuel constraints. These
problems require treating carefully the corresponding PDEs. The former
is joint with Li-Cheng Tsai, and the latter joint with Xin Guo and Renyuan Xu.
In the second part, I will focus on the Mallows' permutation, and various
generalizations. In particular, I will talk about a general model, called
regenerative permutations. I will also discuss the statistical properties and
algorithms of these Mallows' type ranking models. This is partly joint with
Jim Pitman. If time permits, I will discuss recent progress on the random walk derived
from random permutations, which is motivated by applications in systems biology.
Sampling Gibbs measures at low temperature is a very
important task but computationally very challenging. Numeric evidence
suggest that the infinite-swapping algorithm (isa) is a promising
method. The isa can be seen as an improvement of replica methods which
are very popular. We rigorously analyze the ergodic properties of the
isa in the low temperature regime deducing Eyring-Kramers formulas for
the spectral gap (or Poincare constant) and the log-Sobolev constant.
Our main result shows that the effective energy barrier can be reduced
drastically using the isa compared to the classical over-damped
Langevin dynamics. As a corollary we derive a deviation inequality
showing that sampling is also improved by an exponential factor.
Furthermore, we analyze simulated annealing for the isa and show that
isa is again superior to the over-damped Langevin dynamics. This is
joint work with Georg Menz, André Schlichting and Tianqi Wu.
Wuchen Li, UCLA
" Transport information geometric learning "
Nowadays, optimal transport, i.e., Wasserstein metrics, play essential roles in data science. In this talk, we briefly review its development and applications in machine learning. In particular, we will focus its induced optimal control problems in density space and differential structures. We introduce the Wasserstein natural gradient in parametric models. The Wasserstein metric tensor in probability density space is pulled back to the one on parameter space. We derive the Wasserstein gradient flows and proximal operators in parameter space. We demonstrate that the Wasserstein natural gradient works efficiently in learning, with examples in Boltzmann machines, generative adversary networks (GANs), image classifications, and adversary robustness etc.
Wuchen Li, UCLA
" Accelerated Information Gradient flow "
We present a systematic framework for the Nesterov's
accelerated gradient flows in the spaces of probabilities embedded
with information metrics. Here two metrics are considered, including
both the Fisher-Rao metric and the Wasserstein-2 metric. For the
Wasserstein-2 metric case, we prove the convergence properties of the
accelerated gradient flows, and introduce their formulations in
Gaussian families. Furthermore, we propose a practical discrete-time
algorithm in particle implementations with an adaptive restart
technique. We formulate a novel bandwidth selection method, which
learns the Wasserstein-2 gradient direction from Brownian-motion
samples. Experimental results including Bayesian inference show the
strength of the current method compared with the state-of-the-art.
Further connections with inverse problems and data related
optimization techniques will be discussed.
Sui Tang, John Hopkins University
" Analysis of dynamical data "
High-dimensional dynamical data arise in many fields of modern science and introduce new challenges in statistical learning and data recovery. In this talk, I will present two sets of problems. One is related to the data-driven discovery of dynamics in systems of interacting agents: given only observed trajectories of the system, we are interested in estimating the interaction laws between the agents using tools from statistical/machine learning. We show that at least in particular circumstances, where the interactions are governed by (unknown) functions of distances, the high-dimensionality of the state space of the system does not affect the learning rates. We can achieve an optimal learning rate for the interaction kernel, equal to that of a one-dimensional regression problem. The other one is related to the dynamical sampling: a new area in sampling theory that deals with processing a linear time series of evolving signals and aims at recovering the initial state and the forward operator from its coarsely sampled evolving states. We provide mathematical theories to show how the dynamics can inform feasible space-time sampling locations and the fundamental limit of space-time trade-off.
Sui Tang, John Hopkins University
" Learning interaction laws in agent based systems "
Interacting agent systems are ubiquitous in science, from the
modeling of particles in physics to prey-predator in Biology, to opinion
dynamics in social sciences. A fundamental yet challenging problem in
various areas of science is to infer the interaction rule that yields
collective motions of agents. We consider an inverse problem: given
abundantly observed trajectories of the system, we are interested in
estimating the interaction laws between the agents. We show that at least
in the particular setting where the interaction is governed by an
(unknown) function of pairwise distances, under a suitable coercivity
condition that guarantees the well-posedness of the problem of recovering
the interaction kernel, statistically and computationally efficient,
nonparametric, suitably-regularized least-squares estimators exist. Our
estimators achieve the optimal learning rate for one-dimensional (the
variable being pairwise distance) regression problems with noisy
observations. We demonstrate the performance of our estimators on various
simple examples. In particular, our estimators produced accurate
predictions of collective dynamics in relative large time intervals, even
when they are learned from data collected in short time intervals.
Fei Pu, University of Utah
" The stochastic heat equations: hitting probabilities and spatial limit theorems via Malliavin calculus. "
In this talk, I will present various properties of the solution
to stochastic heat equations, which include the optimal lower bound on
hitting probabilities; probability density function of supremum; spatial
ergodicity, central limit theorem, Poisson limit theorem. The tool to
study these properties is Malliavin calculus.
Jessica Lin, McGill University
" An Invariance Principle for Barycentric Brownian Bees "
Abstract: We establish an invariance principle for a Brunet-Derrida particle system in d-dimensions with selection. The model consists of N particles undergoing dyadic branching Brownian motion with rate 1. At a branching event, the number of particles is kept equal to N by removing the particle located furthest away from the barycenter. This talk is based on joint work with Louigi Addario-Berry and Thomas Tendron.
Bao Wang, UCLA
" PDE-Principled Trustworthy Deep Learning Meets Computational Biology "
Deep learning achieves tremendous success in image and speech recognition and machine translation. However, deep learning is not trustworthy. 1. How to improve the robustness of deep neural networks? Deep neural networks are well known to be vulnerable to adversarial attacks. For instance, malicious attacks can fool the Tesla self-driving system by making a tiny change on the scene acquired by the intelligence system. 2. How to compress high-capacity deep neural networks efficiently without loss of accuracy? It is notorious that the computational cost of inference by deep neural networks is one of the major bottlenecks for applying them to mobile devices. 3. How to protect the private information that is used to train a deep neural network? Deep learning-based artificial intelligence systems may leak the private training data. Fredrikson et al. recently showed that a simple model-inversion attack can recover the portraits of the victims whose face images are used to train the face recognition system. In this talk, I will present some recent work on developing PDE-principled robust neural architecture and optimization algorithms for robust, accurate, private, and efficient deep learning. I will also present some potential applications of the data-driven approach for bio-molecule simulation and computer-aided drug design.
Friday, February 21. Joint Stochastics and Applied Math Seminar. 3-4 PM. LCB 225
Bao Wang, UCLA
" Momentum and PDE Techniques for Deep Learning "
Accelerating stochastic gradient-based optimization and sampling algorithms are of crucial importance for
modern machine learning, in particular, deep learning. Developing mathematically principled architectures
for deep neural nets (DNNs) is another important direction to advance deep learning. In this talk, I will
present some recent progress in leveraging Nesterov Accelerated Gradient (NAG) with scheduled restarting
to accelerate SGD and to improve training DNNs. Moreover, I will talk about some recent results on PDE
based techniques for accelerating stochastic gradient descent and Langevin dynamics. Finally, I will discuss
a momentum-based formulation for DNN design.
Michael Damron, Georgia Tech
" Universality for the time constant in critical first-passage percolation "
In first-passage percolation, we place i.i.d. nonnegative weights (t_e) on the edges of a graph and consider the induced weighted graph metric T(x,y). When the underlying graph is the two-dimensional square lattice, there is a phase transition in the model depending on the probability p that an edge weight equals zero: for p<1/2, the metric T(0,x) grows linearly in x, whereas for p>1/2, it remains stochastically bounded. The critical case occurs for p=1/2, where there are large but finite clusters of zero-weight edges. In this talk, I will review work with Wai-Kit Lam and Xuan Wang in which we determine the rate of growth for T(0,x) up to a constant factor for all critical distributions. Then I will explain recent work with Jack Hanson and Wai-Kit Lam in which we determine the "time constant" (leading order constant in the rate of growth) in the special case where the graph is the triangular lattice, and the weights are placed on the vertices. This is the only class of distributions known where this time constant is computable: we find that it is an explicit function of the infimum of the support of t_e intersected with (0,\infty).
Carsten Chong, EPFL, Switzerland
" Asymptotic behavior of the stochastic heat equation with Levy noise "
We discuss some recent results about the macroscopic behavior of the solution to the stochastic heat equation with Levy noise.
For a fixed spatial point, we show that the solution develops unusually large peaks as time tends to infinity,
a phenomenon which we refer to as additive intermittency because it already occurs under additive Levy noise.
When we consider a fixed time point and let the spatial variable become large,
we make a similar observation: the solution develops larger peaks than one would expect from moment considerations.
Perhaps most interestingly, the largest size of these intermittency islands is the same, no matter whether we have additive or multiplicative Levy noise.
This talk is based on joint work with Peter Kevei (University of Szeged).
Friday, March 13. No seminar due to Spring break. Friday, March 27. Stochastics Seminar. 3-4 PM. LCB 225