Compositional Reinforcement Learning from Logical Specifications

Kishor Jothimurugan University of Pennsylvania

Osbert Bastani University of Pennsylvania

Suguman Bansal University of Pennsylvania

Rajeev Alur University of Pennsylvania

Abstract

We study the problem of learning control policies for complex tasks given by logical specifications. Recent approaches automatically generate a reward function from a given specification and use a suitable reinforcement learning algorithm to learn a policy that maximizes the expected reward. These approaches, however, scale poorly to complex tasks that require high-level planning. In this work, we develop a compositional learning approach, called DIRL, that interleaves highlevel planning and reinforcement learning. First, DIRL encodes the specification as an abstract graph; intuitively, vertices and edges of the graph correspond to regions of the state space and simpler sub-tasks, respectively. Our approach then incorporates reinforcement learning to learn neural network policies for each edge (sub-task) within a Dijkstra-style planning algorithm to compute a high-level plan in the graph. An evaluation of the proposed approach on a set of challenging control benchmarks with continuous state and action spaces demonstrates that it outperforms state-of-the-art baselines

朱小虎 Xiaohu Zhu

一切为了我们世界的长远未来 - All about the long-term future of our world.

Compositional Reinforcement Learning from Logical Specifications