Experiments
Setup
The toolkit evaluates algorithms with \(k=20\) items, budgets \(B \in \{40, 60, 80\}\), and 30 runs per dataset. Synthetic data includes random features, while Jester and MovieLens lack features, causing Contextual PARWiS to fall back to non-contextual behavior. RL PARWiS is trained for 5000 episodes.
Results
The following tables summarize recovery fraction, true rank of reported winner, and cumulative regret across datasets and budgets, as reported in [17].
Agent |
Synthetic (B=40) |
Synthetic (B=60) |
Synthetic (B=80) |
Jester (B=40) |
Jester (B=60) |
Jester (B=80) |
MovieLens (B=40) |
MovieLens (B=60) |
MovieLens (B=80) |
|---|---|---|---|---|---|---|---|---|---|
Double TS |
0.200 |
0.067 |
0.267 |
0.167 |
0.233 |
0.467 |
0.133 |
0.067 |
0.067 |
Random |
0.033 |
0.067 |
0.000 |
0.033 |
0.000 |
0.067 |
0.033 |
0.000 |
0.067 |
PARWiS |
0.467 |
0.467 |
0.467 |
0.467 |
0.467 |
0.467 |
0.167 |
0.167 |
0.167 |
Contextual PARWiS |
0.367 |
0.367 |
0.367 |
0.433 |
0.433 |
0.433 |
0.167 |
0.167 |
0.167 |
RL PARWiS |
0.367 |
0.367 |
0.367 |
0.467 |
0.467 |
0.467 |
0.100 |
0.100 |
0.100 |
Agent |
Synthetic (B=40) |
Synthetic (B=60) |
Synthetic (B=80) |
Jester (B=40) |
Jester (B=60) |
Jester (B=80) |
MovieLens (B=40) |
MovieLens (B=60) |
MovieLens (B=80) |
|---|---|---|---|---|---|---|---|---|---|
Double TS |
8.233 |
6.933 |
4.767 |
6.700 |
4.700 |
3.133 |
9.233 |
10.300 |
11.500 |
Random |
10.767 |
10.367 |
10.733 |
10.733 |
9.367 |
10.733 |
9.233 |
11.033 |
10.767 |
PARWiS |
3.233 |
3.233 |
3.233 |
2.067 |
2.067 |
2.067 |
6.633 |
6.633 |
6.633 |
Contextual PARWiS |
3.900 |
4.067 |
4.067 |
2.233 |
2.233 |
2.233 |
6.633 |
6.633 |
6.633 |
RL PARWiS |
3.533 |
3.533 |
3.533 |
2.067 |
2.067 |
2.067 |
6.667 |
6.667 |
6.667 |
Agent |
Synthetic (B=40) |
Synthetic (B=60) |
Synthetic (B=80) |
Jester (B=40) |
Jester (B=60) |
Jester (B=80) |
MovieLens (B=40) |
MovieLens (B=60) |
MovieLens (B=80) |
|---|---|---|---|---|---|---|---|---|---|
Double TS |
35.300 |
52.933 |
67.267 |
34.067 |
51.167 |
67.667 |
36.733 |
55.767 |
74.800 |
Random |
36.633 |
54.833 |
73.200 |
36.167 |
54.233 |
72.600 |
37.733 |
56.767 |
75.800 |
PARWiS |
11.733 |
22.000 |
33.133 |
9.567 |
17.600 |
25.633 |
18.067 |
35.100 |
52.333 |
Contextual PARWiS |
13.067 |
24.333 |
35.467 |
10.167 |
18.533 |
27.200 |
18.100 |
35.133 |
52.367 |
RL PARWiS |
14.367 |
26.300 |
42.300 |
11.100 |
21.667 |
32.400 |
19.567 |
38.633 |
56.967 |
Discussion:
Synthetic and Jester: PARWiS and RL PARWiS achieve high recovery fractions (0.467) and low true ranks (2.067–3.233), excelling on datasets with moderate to large \(\Delta_{1,2}\) (0.0152 for Synthetic, 0.0946 for Jester). Cumulative regret is significantly lower for PARWiS (e.g., 11.733 at \(B=40\) on Synthetic).
MovieLens: The small \(\Delta_{1,2}\) (0.0008) challenges all algorithms, with recovery fractions dropping to 0.100–0.167. PARWiS maintains the lowest regret (52.333 at \(B=80\)).
Contextual PARWiS: Performs similarly to PARWiS on real-world datasets due to missing features, with slight underperformance on Synthetic data, suggesting feature optimization needs.
RL PARWiS: Matches PARWiS on Jester but struggles on MovieLens, indicating potential for improved training or state representation.
Statistical t-tests (see Appendix Tables) confirm PARWiS’s significant improvements over Double TS on Synthetic and Jester (\(p < 0.05\)), but not on MovieLens due to the dataset’s difficulty.