SOTA systems#
In the previous sessions, we covered foundational concepts in RL which are relevant when encountering RL in the context of LM fine-tuning. Equipped with these conceptual and theoretical tools, the learning goals in the sessions from now on are:
to familiarize ourselves with state-of-the-art (SOTA) systems which apply these tools in practice
to critically think about motivation, methods and results produced with these systems
to understand the variety of ways in which RL can be applied in combination with LMs
to critically think about the systems and how to evaluate them
to get inspiration and practical insights for your own projects!
We will start working towards these goals by looking at at the nuts and bolts of SOTA LLMs which were fine-tuned with RL in various interesting ways (and aren’t ChatGPT). In particular, in this session we will familiarize ourselved with the models InstructGPT, Sparrrow and a Constitutional-AI LLM. Further models which also had interesting variations during training are listed below.
The slides for the session can be found here.
Student presentations#
Sparrow and the Constitutional AI LM are covered by student presentations. If possible, slides from student presentation will be available here.