Deep Exploration via Randomized Value Functions

Ben Van Roy - Stanford University

Jan. 26, 2018, 2:30 p.m. - Jan. 26, 2018, 3:30 p.m.


Hosted by: Doina Precup

An important challenge in reinforcement learning concerns how an agent can simultaneously explore and generalize in a reliably efficient manner. It is difficult to claim that one can produce a robust artificial intelligence without tackling this fundamental issue. This talk will present a systematic approach to exploration that induces judicious probing through randomization of value function estimates and operates effectively in tandem with common reinforcement learning algorithms, such as least-squares value iteration and temporal-difference learning, that generalize via parameterized representations of the value function. Theoretical results offer assurances with tabular representations of the value function, and computational results suggest that the approach remains effective with generalizing 

Benjamin Van Roy is a Professor of Electrical Engineering, Management Science and Engineering, and, by courtesy, Computer Science, at Stanford University. His research focuses on understanding how an agent interacting with a poorly understood environment can learn over time to make effective decisions. He is an INFORMS Fellow, serves as editor for the Learning Theory area of Mathematics of Operations Research and has served as editor of the Financial Engineering area of Operations Research.  He has also led research programs at Unica (acquired by IBM), Enuvis (acquired by SiRF), and Morgan Stanley.