skip to content

Safe Value Functions

Pierre-François Massiania, Steve Heimb, Friedrich Solowjowa, Sebastian Trimpea

a Institute for Data Science in Mechanical Engineering, RWTH Aachen University, 52068 Aachen,
Germany
b Biomimetic Robotics Lab, Department of Mechanical Engineering, Massachussets Institute of
Technology, 02139 Cambridge, USA

Safety constraints and optimality are important but sometimes conflicting criteria for controllers. Although these criteria are often solved separately with different tools to maintain formal guarantees, it is also common practice in reinforcement learning to simply modify reward functions by penalizing failures, with the penalty treated as a mere heuristic. In this talk, I will introduce Safe Value Functions (SVFs): value functions that are both optimal for a given task, and enforce safety constraints. I will present their relationship with penalties and show that failure penalization naturally gives rise to SVFs. There is an upper-unbounded interval of penalties that achieve an SVF; high penalties do not harm optimality. The analysis relies on understanding when optimal control stabilizes the viability kernel, i.e., the largest safe set. Although it is often intractable to compute the minimum required penalty, SVFs reveal clear structure of how the penalty, rewards, discount factor, and dynamics interact. This insight suggests practical, theory-guided heuristics to design reward functions for optimal control problems where safety is important.

[1] P.-F. Massiani, S. Heim. F. Solowjow, S. Trimpe, Safe Value Functions, IEEE Transactions on Automatic Control, 2023.