Full description

The project involves collecting the child reading dataset for the language is Xhosa, a South African Bantu language. The collected dataset is then processed with the help of native speakers and utilized to train state-of-the-art machine learning models focussed on assessing whether the child has spoken the word correctly or not. The dataset contains 14,972 recordings with an average of 4 seconds each. Each recording is annotated by three independent markers and consists of children speaking a particular word or letter from the Xhosa language in a classroom setting. Please note that the attached zip file contains ~14,000 files. If you download this file to a Onedrive or Sharepoint location, you may be affected by the 10,000 files limit to download. When unzipping or downloading, take care to ensure that all the files are downloaded completely.

Created: 2025-05-21

Data time period: 02 2024 to 30 11 2024

Spatial Coverage And Location

text: South Africa

Subjects

User Contributed Tags

Login to tag this record with meaningful keywords to make it easier to discover

Identifiers

DOI : 10.26183/93X0-QY45
Local : research-data.westernsydney.edu.au/published/7dfe822035f011f096a41d0408cdc7bb

EGRA-Xhosa-14.9k: Annotated Child Reading Audio Dataset

Licence & Rights:

Access:

Contact Information

Full description

This dataset is part of a larger collection

Spatial Coverage And Location

User Contributed Tags

Quick Links

Explore

External Resources

Share

EGRA-Xhosa-14.9k: Annotated Child Reading Audio Dataset

Licence & Rights:

Access:

Contact Information

Full description

This dataset is part of a larger collection

Related People

Spatial Coverage And Location

User Contributed Tags