Reward potentials for planning with learned neural network transition models

Buser Say, Scott Sanner, Sylvie Thiebaux

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

Abstract

Optimal planning with respect to learned neural network (NN) models in continuous action and state spaces using mixed-integer linear programming (MILP) is a challenging task for branch-and-bound solvers due to the poor linear relaxation of the underlying MILP model. For a given set of features, potential heuristics provide an efficient framework for computing bounds on cost (reward) functions. In this paper, we model the problem of finding optimal potential bounds for learned NN models as a bilevel program, and solve it using a novel finite-time constraint generation algorithm. We then strengthen the linear relaxation of the underlying MILP model by introducing constraints to bound the reward function based on the precomputed reward potentials. Experimentally, we show that our algorithm efficiently computes reward potentials for learned NN models, and that the overhead of computing reward potentials is justified by the overall strengthening of the underlying MILP model for the task of planning over long horizons.
Original languageEnglish
Title of host publicationPrinciples and Practice of Constraint Programming
Subtitle of host publication25th International Conference, CP 2019 Stamford, CT, USA, September 30 – October 4, 2019 Proceedings
EditorsThomas Schiex, Simon de Givry
Place of PublicationCham Switzerland
PublisherSpringer
Pages674-689
Number of pages16
ISBN (Electronic)9783030300487
ISBN (Print)9783030300470
DOIs
Publication statusPublished - 2019
Externally publishedYes

Publication series

NameLecture Notes in Computer Science
PublisherSpringer
Volume11802
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Keywords

  • Neural networks
  • Potential heuristics
  • Planning
  • Constraint generation

Cite this