Learning options from demonstrations: a Pac-Man case study

Marco Tamassia, Fabio Zambetta, William L. Raffe, Floyd 'Floyd" Mueller, Xiaodong Li

Research output: Contribution to journalArticleResearchpeer-review

Abstract

and control applications. RL agents improve through trial-and-error, therefore undergoing a learning phase during which they perform suboptimally. Research effort has been put into optimizing behavior during this period, to reduce its duration and to maximize after-learning performance. We introduce a novel algorithm that extracts useful information from expert demonstrations (traces of interactions with the target environment) and uses it to improve performance. The algorithm detects unexpected decisions made by the expert and infers what goal the expert was pursuing. Goals are then used to bias decisions while learning. Our experiments in the video game Pac-Man provide statistically significant evidence that our method can improve final performance compared to a state-of-the-art approach.
Original languageEnglish
Pages (from-to)91-96
Number of pages6
JournalIEEE Transactions on Games
Volume10
Issue number1
DOIs
Publication statusPublished - Mar 2018
Externally publishedYes

Keywords

  • Learning from demonstration
  • options framework
  • reinforcement learning (RL)
  • temporal difference learning

Cite this