论文标题
汤普森(Thompson)进行无限制延迟
Thompson Sampling with Unrestricted Delays
论文作者
论文摘要
我们研究了汤普森采样的特性,并在随机的多武器匪徒问题中延迟反馈。在与I.I.D延迟的设置中,我们据我们所知,以任意延迟分布(包括具有无限期望的延迟分布)来确定汤普森采样的第一个遗憾界限。我们的边界与通过临时算法得出的最佳可用界限在质量上相当,并且仅取决于延迟分布的选定分位数的延迟。此外,在广泛的仿真实验中,我们发现汤普森采样的表现优于许多替代建议,包括专门为具有延迟反馈的设置而设计的方法。
We investigate properties of Thompson Sampling in the stochastic multi-armed bandit problem with delayed feedback. In a setting with i.i.d delays, we establish to our knowledge the first regret bounds for Thompson Sampling with arbitrary delay distributions, including ones with unbounded expectation. Our bounds are qualitatively comparable to the best available bounds derived via ad-hoc algorithms, and only depend on delays via selected quantiles of the delay distributions. Furthermore, in extensive simulation experiments, we find that Thompson Sampling outperforms a number of alternative proposals, including methods specifically designed for settings with delayed feedback.
