Prof. David Leslie, Lancaster University
Reinforcement learning is a very popular model of machine, human, and animal learning in which the value of taking actions in different states is estimated online from observations. We will observe that convergence of the learning relies only on the fact that averages converge to expected values. However these simple convergence results only hold when an individual receives unambiguous information as to which state the world is in. In any natural environment the state information is noisy, so that the learner cannot be certain about the current state of nature, and extensions to the basic model are needed. We will discuss how to address this problem, using simple tools from probability, statistics, and dynamical systems theory.