Posts Tagged ‘multi-armed bandit’

Ruihao Zhu — Hedging the Drift: Learning to Optimize under Non-Stationarity

Abstract: We introduce general data-driven decision-making algorithms that achieve state-of-the-art dynamic regret bounds for non-stationary bandit settings. They capture applications such as advertisement allocation and dynamic pricing in changing environments. We show how the difficulty posed by the (unknown a priori and possibly adversarial) non-stationarity can be overcome by an unconventional marriage between stochastic and…

Read More