Data

Dataset from ES-C51: Expected Sarsa Based C51 Distributional Reinforcement Learning Algorithm

Federation University Australia
Tandon, Rijul ; Vamplew, Peter ; Foale, Cameron
Viewed: [[ro.stat.viewed]] Cited: [[ro.stat.cited]] Accessed: [[ro.stat.accessed]]
ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&rfr_id=info%3Asid%2FANDS&rft_id=info:doi10.25955/30359872.v1&rft.title=Dataset from ES-C51: Expected Sarsa Based C51 Distributional Reinforcement Learning Algorithm&rft.identifier=10.25955/30359872.v1&rft.publisher=Federation University Australia&rft.description=This dataset contains the results of experiments comparing the performance of the standard Q-learning based distributional deep reinforcement learning algorithm QL-C51, and a novel variant which uses Expected-Sarsa temporal difference updates (ES-C51). Each algorithm was executed for 10 separate runs with independent seeds on 22 environments (Acrobot, Cartpole, and the Atari-10 environments with and without stochasticity). Results are reported for each run in terms of the mean episodic reward over the last 10% of learning episodes. Full details are in the corresponding paper.&rft.creator=Tandon, Rijul &rft.creator=Vamplew, Peter &rft.creator=Foale, Cameron &rft.date=2025&rft.edition=1&rft_rights= https://creativecommons.org/licenses/by/4.0/&rft_subject=Reinforcement learning&rft_subject=Expected sarsa&rft_subject=C51&rft_subject=deep reinforcement learning&rft_subject=distributional reinforcement learning&rft.type=dataset&rft.language=English Access the data

Full description

This dataset contains the results of experiments comparing the performance of the standard Q-learning based distributional deep reinforcement learning algorithm QL-C51, and a novel variant which uses Expected-Sarsa temporal difference updates (ES-C51). Each algorithm was executed for 10 separate runs with independent seeds on 22 environments (Acrobot, Cartpole, and the Atari-10 environments with and without stochasticity). Results are reported for each run in terms of the mean episodic reward over the last 10% of learning episodes. Full details are in the corresponding paper.

Issued: 15 10 2025

Created: 15 10 2025

Modified: 15 10 2025

This dataset is part of a larger collection

Click to explore relationships graph
Subjects

User Contributed Tags    

Login to tag this record with meaningful keywords to make it easier to discover

Identifiers