Data

MFCCs Feature Scaling Images for Multi-class Human Action Analysis : A Benchmark Dataset

The University of Western Australia
Shaikh, Muhammad Bilal ; Chai, Douglas ; Islam, Syed Mohammed Shamsul ; Akhtar, Naveed
Viewed: [[ro.stat.viewed]] Cited: [[ro.stat.cited]] Accessed: [[ro.stat.accessed]]
ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&rfr_id=info%3Asid%2FANDS&rft_id=info:doi10.17632/6d8v9jmvgm.1&rft.title=MFCCs Feature Scaling Images for Multi-class Human Action Analysis : A Benchmark Dataset&rft.identifier=10.17632/6d8v9jmvgm.1&rft.publisher=Mendeley Data&rft.description=his dataset comprises an array of Mel Frequency Cepstral Coefficients (MFCCs) that have undergone feature scaling, representing a variety of human actions. Feature scaling, or data normalization, is a preprocessing technique used to standardize the range of features in the dataset. For MFCCs, this process helps ensure all coefficients contribute equally to the learning process, preventing features with larger scales from overshadowing those with smaller scales. In this dataset, the audio signals correspond to diverse human actions such as walking, running, jumping, and dancing. The MFCCs are calculated via a series of signal processing stages, which capture key characteristics of the audio signal in a manner that closely aligns with human auditory perception. The coefficients are then standardized or scaled using methods such as MinMax Scaling or Standardization, thereby normalizing their range. Each normalized MFCC vector corresponds to a segment of the audio signal. The dataset is meticulously designed for tasks including human action recognition, classification, segmentation, and detection based on auditory cues. It serves as an essential resource for training and evaluating machine learning models focused on interpreting human actions from audio signals. This dataset proves particularly beneficial for researchers and practitioners in fields such as signal processing, computer vision, and machine learning, who aim to craft algorithms for human action analysis leveraging audio signals.&rft.creator=Shaikh, Muhammad Bilal &rft.creator=Chai, Douglas &rft.creator=Islam, Syed Mohammed Shamsul &rft.creator=Akhtar, Naveed &rft.date=2023&rft.relation=http://research-repository.uwa.edu.au/en/publications/e9d69696-ad1a-444f-ac33-5cf4014c7f2e&rft_subject=Multimodality&rft_subject=Computer Vision Representation&rft_subject=Benchmarking&rft_subject=Action Recognition&rft.type=dataset&rft.language=English Access the data

Access:

Open

Full description

his dataset comprises an array of Mel Frequency Cepstral Coefficients (MFCCs) that have undergone feature scaling, representing a variety of human actions. Feature scaling, or data normalization, is a preprocessing technique used to standardize the range of features in the dataset. For MFCCs, this process helps ensure all coefficients contribute equally to the learning process, preventing features with larger scales from overshadowing those with smaller scales. In this dataset, the audio signals correspond to diverse human actions such as walking, running, jumping, and dancing. The MFCCs are calculated via a series of signal processing stages, which capture key characteristics of the audio signal in a manner that closely aligns with human auditory perception. The coefficients are then standardized or scaled using methods such as MinMax Scaling or Standardization, thereby normalizing their range. Each normalized MFCC vector corresponds to a segment of the audio signal. The dataset is meticulously designed for tasks including human action recognition, classification, segmentation, and detection based on auditory cues. It serves as an essential resource for training and evaluating machine learning models focused on interpreting human actions from audio signals. This dataset proves particularly beneficial for researchers and practitioners in fields such as signal processing, computer vision, and machine learning, who aim to craft algorithms for human action analysis leveraging audio signals.

Notes

External Organisations
Edith Cowan University
Associated Persons
Douglas Chai (Contributor); Syed Mohammed Shamsul Islam (Contributor)Muhammad Bilal Shaikh (Creator)

Issued: 2023-07-26

This dataset is part of a larger collection

Click to explore relationships graph
Subjects

User Contributed Tags    

Login to tag this record with meaningful keywords to make it easier to discover

Other Information
MAiVAR: Multimodal Audio-Image and Video Action Recognizer

url : http://research-repository.uwa.edu.au/en/publications/320adc19-a686-4612-ac40-2216742198bc

Conference paper

Identifiers