Provably Efficient Long-Horizon Exploration in Monte Carlo Tree Search through State Occupancy Regularization

While MCTS has seen significant success in domains like game-playing, it is not used much in robotics applications due to its poor exploration behavior over long distances. We show that regularizing MCTS's state distribution makes it exponentially faster at long horizon and reveals a connection to state-of-the-art exploration strategies like count-based exploration and Voronoi bias.

Abstract

Monte Carlo tree search (MCTS) has been successful in a variety of domains, but faces challenges with long-horizon exploration when compared to sampling-based motion planning algorithms like Rapidly-Exploring Random Trees. To address these limitations of MCTS, we derive a tree search algorithm based on policy optimization with state occupancy measure regularization, which we call {\it Volume-MCTS}. We show that count-based exploration and sampling-based motion planning can be derived as approximate solutions to this state occupancy measure regularized objective. We test our method on several robot navigation problems, and find that Volume-MCTS outperforms AlphaZero and displays significantly better long-horizon exploration properties.

BibTeX

@misc{schramm2024provablyefficientlonghorizonexploration,
      title={Provably Efficient Long-Horizon Exploration in Monte Carlo Tree Search through State Occupancy Regularization}, 
      author={Liam Schramm and Abdeslam Boularias},
      year={2024},
      eprint={2407.05511},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2407.05511}, 
}