Full description
This dataset contains the results of experiments comparing the performance of the standard Q-learning based distributional deep reinforcement learning algorithm QL-C51, and a novel variant which uses Expected-Sarsa temporal difference updates (ES-C51). Each algorithm was executed for 10 separate runs with independent seeds on 22 environments (Acrobot, Cartpole, and the Atari-10 environments with and without stochasticity). Results are reported for each run in terms of the mean episodic reward over the last 10% of learning episodes. Full details are in the corresponding paper.
Issued: 15 10 2025
Created: 15 10 2025
Modified: 15 10 2025
User Contributed Tags
Login to tag this record with meaningful keywords to make it easier to discover
- DOI : 10.25955/30359872.V1
