Imagine you’re a gambler and you’re standing in front of several slot machines. Your goal is to maximize your winnings, but you don’t actually know anything about the potential rewards offered by each ...
How does a gambler maximize winnings from a row of slot machines? This is the inspiration for the "multi-armed bandit problem," a common task in reinforcement learning in which "agents" make choices ...