Data

EGRA-Xhosa-14.9k: Annotated Child Reading Audio Dataset

Western Sydney University
Chevtchenko, Sergio ; Navas, Nikhil ; Vale, Rafaella ; Ubaudi, Franco ; Lucwaba, Sipumelele ; Ardington, Cally ; Afshar, Soheil ; Antoniou, Mark ; Afshar, Saeed
Viewed: [[ro.stat.viewed]] Cited: [[ro.stat.cited]] Accessed: [[ro.stat.accessed]]
ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&rfr_id=info%3Asid%2FANDS&rft_id=info:doi10.26183/93x0-qy45&rft.title=EGRA-Xhosa-14.9k: Annotated Child Reading Audio Dataset&rft.identifier=10.26183/93x0-qy45&rft.publisher=Western Sydney University&rft.description=The project involves collecting the child reading dataset for the language is Xhosa, a South African Bantu language. The collected dataset is then processed with the help of native speakers and utilized to train state-of-the-art machine learning models focussed on assessing whether the child has spoken the word correctly or not. The dataset contains 14,972 recordings with an average of 4 seconds each. Each recording is annotated by three independent markers and consists of children speaking a particular word or letter from the Xhosa language in a classroom setting.&rft.creator=Chevtchenko, Sergio &rft.creator=Navas, Nikhil &rft.creator=Vale, Rafaella &rft.creator=Ubaudi, Franco &rft.creator=Lucwaba, Sipumelele &rft.creator=Ardington, Cally &rft.creator=Afshar, Soheil &rft.creator=Antoniou, Mark &rft.creator=Afshar, Saeed &rft.date=2025&rft.edition=undefined&rft.coverage=South Africa&rft_rights=Copyright Western Sydney University&rft_rights=CC BY-NC-SA: Attribution-Noncommercial-Share Alike 3.0 AU http://creativecommons.org/licenses/by-nc-sa/3.0/au&rft_subject=EGRA-AI&rft_subject=EGRA&rft_subject=Children&rft_subject=Early Grade&rft_subject=Assessment&rft_subject=isiXhosa&rft_subject=Classroom&rft_subject=Annotated&rft_subject=Machine learning not elsewhere classified&rft_subject=Machine learning&rft_subject=INFORMATION AND COMPUTING SCIENCES&rft_subject=Data management and data science not elsewhere classified&rft_subject=Data management and data science&rft_subject=Artificial intelligence not elsewhere classified&rft_subject=Artificial intelligence&rft_subject=Applied computing not elsewhere classified&rft_subject=Applied computing&rft_subject=Language studies not elsewhere classified&rft_subject=Language studies&rft_subject=LANGUAGE, COMMUNICATION AND CULTURE&rft_subject=Applied mathematics not elsewhere classified&rft_subject=Applied mathematics&rft_subject=MATHEMATICAL SCIENCES&rft_subject=Education systems not elsewhere classified&rft_subject=Education systems&rft_subject=EDUCATION&rft_subject=Communication not elsewhere classified&rft_subject=Communication&rft_subject=CULTURE AND SOCIETY&rft_subject=Learner and learning not elsewhere classified&rft_subject=Learner and learning&rft_subject=EDUCATION AND TRAINING&rft_subject=Teaching and curriculum not elsewhere classified&rft_subject=Teaching and curriculum&rft_subject=Other education and training not elsewhere classified&rft_subject=Other education and training&rft_subject=Communication technologies, systems and services not elsewhere classified&rft_subject=Communication technologies, systems and services&rft_subject=INFORMATION AND COMMUNICATION SERVICES&rft_subject=Information systems, technologies and services not elsewhere classified&rft_subject=Information systems, technologies and services&rft_subject=Expanding knowledge in language, communication and culture&rft_subject=Expanding knowledge&rft_subject=EXPANDING KNOWLEDGE&rft.type=dataset&rft.language=English Access the data

Licence & Rights:

Non-Commercial Licence view details
CC-BY-NC-SA

CC BY-NC-SA: Attribution-Noncommercial-Share Alike 3.0 AU
http://creativecommons.org/licenses/by-nc-sa/3.0/au

Copyright Western Sydney University

Access:

Open view details

Open

Contact Information



Full description

The project involves collecting the child reading dataset for the language is Xhosa, a South African Bantu language. The collected dataset is then processed with the help of native speakers and utilized to train state-of-the-art machine learning models focussed on assessing whether the child has spoken the word correctly or not. The dataset contains 14,972 recordings with an average of 4 seconds each. Each recording is annotated by three independent markers and consists of children speaking a particular word or letter from the Xhosa language in a classroom setting.

Created: 2025-05-21

Data time period: 02 2024 to 30 11 2024

This dataset is part of a larger collection

Click to explore relationships graph

Spatial Coverage And Location

text: South Africa

Subjects

User Contributed Tags    

Login to tag this record with meaningful keywords to make it easier to discover

Identifiers
  • DOI : 10.26183/93X0-QY45
  • Local : research-data.westernsydney.edu.au/published/7dfe822035f011f096a41d0408cdc7bb