Modeling Uncertainty in Deep Learning with Stochastic Gradient Markov Chain Monte Carlo



2016年12月30日(星期五)下午 6:30






Modeling Uncertainty in Deep Learning with Stochastic Gradient Markov Chain Monte Carlo


Changyou Chen

Duke University

Changyou Chen is now a Research Assistant Professor in the Department of Electrical and Computer Engineering at Duke University. He obtained his PhD degree in 2014 from College of Engineering and Computer Science, the Australian National University; his master and bachelor degrees in 2010 and 2007, respectively, both from School of Computer Science, Fudan University, Shanghai, China.

His current research interest focuses on developing scalable Bayesian methods for deep learning. Particularity he develops theory for stochastic gradient Markov Chain Monte Carlo algorithms, and applies them for Bayesian interpretation/extension of the optimization-based deep learning. He is also developing distributed Bayesian learning theory and algorithms for industrial-scale data. His previous research focused on Bayesian Nonparametric methods such as Dirichlet processes and dependent normalized random measures.


Modeling uncertainty in deep learning is an effective way to alleviate the common issue of overfitting in training. Bayesian modeling is a principle way of achieving the goal. Recent advances in Bayesian learning with large-scale data have witnessed emergence of stochastic gradient MCMC algorithms (SG-MCMC), such as stochastic gradient Langevin dynamics (SGLD), stochastic gradient Hamiltonian MCMC (SGHMC), and the stochastic gradient thermostat. This family of algorithms enables scalable Bayesian sampling by adopting ideas from stochastic optimization, where a minibatch of data is used in each iteration of the algorithms to obtain approximate samples from desired posteriors. In this talk, I will talk about basic concepts of SG-MCMC, introduce several representative algorithms, and show the general convergence properties of SG-MCMC algorithms. Finally I will discuss how to apply them in deep learning via some recurrent neural network examples, showing the advantage of modeling uncertainty in deep models.