- This event has passed.
AI4OPT Seminar Series: Chi Jin, Princeton University
October 6, 2022 @ 9:00 am – 10:00 am
Talk Title: When Is Partially Observable Reinforcement Learning Not Scary?
Abstract: Partially observability is ubiquitous in applications of Reinforcement Learning (RL), in which agents learn to make a sequence of decisions despite lacking complete information about the latent states of the controlled system. Partially observable RL is notoriously difficult in theory—well-known information-theoretic results show that learning partially observable Markov decision processes (POMDPs) requires an exponential number of samples in the worst case. Yet, this does not rule out the possible existence of interesting subclasses of POMDPs, which include a large set of partial observable applications in practice while being tractable. In this talk we identify a rich family of tractable POMDPs, which we call weakly revealing POMDPs. This family rules out the pathological instances of POMDPs where observations are uninformative to a degree that makes learning hard. We prove that for weakly revealing POMDPs, a simple algorithm combining optimism and Maximum Likelihood Estimation (MLE) is sufficient to guarantee a polynomial sample complexity. To the best of our knowledge, this gives the first line of provably sample-efficient results for learning from interactions in POMDPs. This is based on joint works with Qinghua Liu, Alan Chung, Akshay Krishnamurthy, Sham Kakade, and Csaba Szepesvari.
Bio: Chi Jin is an assistant professor at the Electrical and Computer Engineering department of Princeton University. He obtained his Ph.D. in Computer Science at University of California, Berkeley, advised by Michael I. Jordan. His research mainly focuses on theoretical machine learning, with special emphasis on nonconvex optimization and reinforcement learning. His representative work includes proving noisy gradient descent escape saddle points efficiently and proving the efficiency of Q-learning and least-squares value iteration when combined with optimism in reinforcement learning.