[Apologies for multiple posting.]

Dear colleagues,

I’m happy to announce the release of a new benchmark dataset (“Yahoo!
Front Page Today Module User Click Log Dataset”) for *unbiased*
evaluation of multi-armed bandit algorithms, through the Yahoo!
Webscope Program:
    http://webscope.sandbox.yahoo.com/catalog.php?datatype=r

Due to the inherent interactive nature of bandit problems, creating a
benchmark dataset for reliable algorithm evaluation is not as
straightforward as in most other fields of machine learning, whose
objectives are often prediction.  This challenge is also known as off-
policy reinforcement learning.

Our dataset contains the click log of over 45M user visits to the
Featured Tab of the Today Module on Yahoo! Front Page. The articles
were chosen uniformly at random to the users, which enables the use of
an unbiased offline evaluation method recently shown to be highly
effective [http://portal.acm.org/citation.cfm?doid=1935826.1935878].
To the best of our knowledge, this is the first benchmark for
evaluating bandit algorithms reliably in real-world applications.

--
Lihong Li

Yahoo! Research
4401 Great America Parkway
Santa Clara, CA, 95054
http://www.cs.rutgers.edu/~lihong
_______________________________________________
uai mailing list
[email protected]
https://secure.engr.oregonstate.edu/mailman/listinfo/uai

Reply via email to