We propose the Belief State Transformer, a next-token predictor that takes both a prefix and suffix as inputs, with a novel objective of predicting both the next token for the prefix and the previous token for the suffix. The Belief State Transformer effectively learns to solve challenging problems that conventional forward-only transformers struggle with, in a domain-independent fashion. Key to this success is learning a compact belief state that captures all relevant information necessary for accurate predictions. In story writing tasks with known prefixes/suffixes, it surpasses Fill-in-the-Middle for reaching known goals and performs robustly even with unknown goals. The approach enables efficient goal-conditioned decoding, improved test-time inference, and high-quality text representations in small-scale problems.
ICLR 2025
@inproceedings{
hu2025the,
title={The Belief State Transformer},
author={Edward S. Hu and Kwangjun Ahn and Qinghua Liu and Haoran Xu and Manan Tomar and Ada Langford and Dinesh Jayaraman and Alex Lamb and John Langford},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=ThRMTCgpvo}
}