Welcome to Dueling Bandit Toolkit’s Documentation!
The Dueling Bandit Toolkit is a Python package for preference-based online learning using dueling bandit algorithms. It implements algorithms like PARWiS, Contextual PARWiS, RL PARWiS, Double Thompson Sampling, and a Random Pair baseline, with support for synthetic and real-world datasets (Jester, MovieLens).
This documentation is based on the research paper PARWiS: Winner determination under shoestring budgets using active pairwise comparisons by Shailendra Bhandari, providing an overview, methodology, experimental results, and API references.