Reinforcement Learning (part 2)

Reinforcement Learning (part 2)#

This session provides a deeper dive into reinforcement learning. In particular, this session introduces the formal framework which is used in RL, the Markov Decision Processes (MDPs).The session provides a brief overview of value functions which can be used to solve MDPs, and then focuses on motivating and introducing policy-gradient methods. These methods are commonly use when fine-tuning language models with RL.

The goal of the session is to provide a more formal introduction to core concepts in RL and to introduce the technical underpinnings of methods used for fine-tuning LLMs with RL.

Slides for the session can be found here.

Further materials (optional)#

Below, further materials on reinforcement learning can be found.