Experiments

Setup

The toolkit evaluates algorithms with \(k=20\) items, budgets \(B \in \{40, 60, 80\}\), and 30 runs per dataset. Synthetic data includes random features, while Jester and MovieLens lack features, causing Contextual PARWiS to fall back to non-contextual behavior. RL PARWiS is trained for 5000 episodes.

Results

The following tables summarize recovery fraction, true rank of reported winner, and cumulative regret across datasets and budgets, as reported in [17].

Recovery Fraction
Agent	Synthetic (B=40)	Synthetic (B=60)	Synthetic (B=80)	Jester (B=40)	Jester (B=60)	Jester (B=80)	MovieLens (B=40)	MovieLens (B=60)	MovieLens (B=80)
Double TS	0.200	0.067	0.267	0.167	0.233	0.467	0.133	0.067	0.067
Random	0.033	0.067	0.000	0.033	0.000	0.067	0.033	0.000	0.067
PARWiS	0.467	0.467	0.467	0.467	0.467	0.467	0.167	0.167	0.167
Contextual PARWiS	0.367	0.367	0.367	0.433	0.433	0.433	0.167	0.167	0.167
RL PARWiS	0.367	0.367	0.367	0.467	0.467	0.467	0.100	0.100	0.100

True Rank of Reported Winner
Agent	Synthetic (B=40)	Synthetic (B=60)	Synthetic (B=80)	Jester (B=40)	Jester (B=60)	Jester (B=80)	MovieLens (B=40)	MovieLens (B=60)	MovieLens (B=80)
Double TS	8.233	6.933	4.767	6.700	4.700	3.133	9.233	10.300	11.500
Random	10.767	10.367	10.733	10.733	9.367	10.733	9.233	11.033	10.767
PARWiS	3.233	3.233	3.233	2.067	2.067	2.067	6.633	6.633	6.633
Contextual PARWiS	3.900	4.067	4.067	2.233	2.233	2.233	6.633	6.633	6.633
RL PARWiS	3.533	3.533	3.533	2.067	2.067	2.067	6.667	6.667	6.667

Cumulative Regret
Agent	Synthetic (B=40)	Synthetic (B=60)	Synthetic (B=80)	Jester (B=40)	Jester (B=60)	Jester (B=80)	MovieLens (B=40)	MovieLens (B=60)	MovieLens (B=80)
Double TS	35.300	52.933	67.267	34.067	51.167	67.667	36.733	55.767	74.800
Random	36.633	54.833	73.200	36.167	54.233	72.600	37.733	56.767	75.800
PARWiS	11.733	22.000	33.133	9.567	17.600	25.633	18.067	35.100	52.333
Contextual PARWiS	13.067	24.333	35.467	10.167	18.533	27.200	18.100	35.133	52.367
RL PARWiS	14.367	26.300	42.300	11.100	21.667	32.400	19.567	38.633	56.967

Discussion:

Synthetic and Jester: PARWiS and RL PARWiS achieve high recovery fractions (0.467) and low true ranks (2.067–3.233), excelling on datasets with moderate to large \(\Delta_{1,2}\) (0.0152 for Synthetic, 0.0946 for Jester). Cumulative regret is significantly lower for PARWiS (e.g., 11.733 at \(B=40\) on Synthetic).
MovieLens: The small \(\Delta_{1,2}\) (0.0008) challenges all algorithms, with recovery fractions dropping to 0.100–0.167. PARWiS maintains the lowest regret (52.333 at \(B=80\)).
Contextual PARWiS: Performs similarly to PARWiS on real-world datasets due to missing features, with slight underperformance on Synthetic data, suggesting feature optimization needs.
RL PARWiS: Matches PARWiS on Jester but struggles on MovieLens, indicating potential for improved training or state representation.

Statistical t-tests (see Appendix Tables) confirm PARWiS’s significant improvements over Double TS on Synthetic and Jester (\(p < 0.05\)), but not on MovieLens due to the dataset’s difficulty.