Data

EGRA-Xhosa-14.9k: Annotated Child Reading Audio Dataset

Western Sydney University
Chevtchenko, Sergio ; Navas, Nikhil ; Vale, Rafaella ; Ubaudi, Franco ; Lucwaba, Sipumelele ; Ardington, Cally ; Afshar, Soheil ; Antoniou, Mark ; Afshar, Saeed
Viewed: [[ro.stat.viewed]] Cited: [[ro.stat.cited]] Accessed: [[ro.stat.accessed]]
ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&rfr_id=info%3Asid%2FANDS&rft_id=info:doi10.26183/93x0-qy45&rft.title=EGRA-Xhosa-14.9k: Annotated Child Reading Audio Dataset&rft.identifier=10.26183/93x0-qy45&rft.publisher=Western Sydney University&rft.description=The project involves collecting the child reading dataset for the language is Xhosa, a South African Bantu language. The collected dataset is then processed with the help of native speakers and utilized to train state-of-the-art machine learning models focussed on assessing whether the child has spoken the word correctly or not. The dataset contains 14,972 recordings with an average of 4 seconds each. Each recording is annotated by three independent markers and consists of children speaking a particular word or letter from the Xhosa language in a classroom setting. Please note that the attached zip file contains ~14,000 files. If you download this file to a Onedrive or Sharepoint location, you may be affected by the 10,000 files limit to download. When unzipping or downloading, take care to ensure that all the files are downloaded completely. &rft.creator=Chevtchenko, Sergio &rft.creator=Navas, Nikhil &rft.creator=Vale, Rafaella &rft.creator=Ubaudi, Franco &rft.creator=Lucwaba, Sipumelele &rft.creator=Ardington, Cally &rft.creator=Afshar, Soheil &rft.creator=Antoniou, Mark &rft.creator=Afshar, Saeed &rft.date=2025&rft.coverage=South Africa&rft_rights=Copyright Western Sydney University&rft_rights=CC BY-NC-SA: Attribution-Noncommercial-Share Alike 3.0 AU http://creativecommons.org/licenses/by-nc-sa/3.0/au&rft_subject=EGRA-AI&rft_subject=EGRA&rft_subject=Children&rft_subject=Early Grade&rft_subject=Assessment&rft_subject=isiXhosa&rft_subject=Classroom&rft_subject=Annotated&rft_subject=Machine learning not elsewhere classified&rft_subject=Machine learning&rft_subject=INFORMATION AND COMPUTING SCIENCES&rft_subject=Data management and data science not elsewhere classified&rft_subject=Data management and data science&rft_subject=Artificial intelligence not elsewhere classified&rft_subject=Artificial intelligence&rft_subject=Applied computing not elsewhere classified&rft_subject=Applied computing&rft_subject=Language studies not elsewhere classified&rft_subject=Language studies&rft_subject=LANGUAGE, COMMUNICATION AND CULTURE&rft_subject=Applied mathematics not elsewhere classified&rft_subject=Applied mathematics&rft_subject=MATHEMATICAL SCIENCES&rft_subject=Education systems not elsewhere classified&rft_subject=Education systems&rft_subject=EDUCATION&rft_subject=Communication not elsewhere classified&rft_subject=Communication&rft_subject=CULTURE AND SOCIETY&rft_subject=Learner and learning not elsewhere classified&rft_subject=Learner and learning&rft_subject=EDUCATION AND TRAINING&rft_subject=Teaching and curriculum not elsewhere classified&rft_subject=Teaching and curriculum&rft_subject=Other education and training not elsewhere classified&rft_subject=Other education and training&rft_subject=Communication technologies, systems and services not elsewhere classified&rft_subject=Communication technologies, systems and services&rft_subject=INFORMATION AND COMMUNICATION SERVICES&rft_subject=Information systems, technologies and services not elsewhere classified&rft_subject=Information systems, technologies and services&rft_subject=Expanding knowledge in language, communication and culture&rft_subject=Expanding knowledge&rft_subject=EXPANDING KNOWLEDGE&rft.type=dataset&rft.language=English Access the data

Licence & Rights:

Non-Commercial Licence view details
CC-BY-NC-SA

CC BY-NC-SA: Attribution-Noncommercial-Share Alike 3.0 AU
http://creativecommons.org/licenses/by-nc-sa/3.0/au

Copyright Western Sydney University

Access:

Open view details

Open

Contact Information



Full description

The project involves collecting the child reading dataset for the language is Xhosa, a South African Bantu language. The collected dataset is then processed with the help of native speakers and utilized to train state-of-the-art machine learning models focussed on assessing whether the child has spoken the word correctly or not. The dataset contains 14,972 recordings with an average of 4 seconds each. Each recording is annotated by three independent markers and consists of children speaking a particular word or letter from the Xhosa language in a classroom setting. Please note that the attached zip file contains ~14,000 files. If you download this file to a Onedrive or Sharepoint location, you may be affected by the 10,000 files limit to download. When unzipping or downloading, take care to ensure that all the files are downloaded completely.

Created: 2025-05-21

Data time period: 02 2024 to 30 11 2024

This dataset is part of a larger collection

Click to explore relationships graph

Spatial Coverage And Location

text: South Africa

Subjects

User Contributed Tags    

Login to tag this record with meaningful keywords to make it easier to discover

Identifiers
  • DOI : 10.26183/93X0-QY45
  • Local : research-data.westernsydney.edu.au/published/7dfe822035f011f096a41d0408cdc7bb