Data

Noisy speech sequences for evaluation of voice activity detection (VAD) algorithms

Name: Noisy speech sequences for evaluation of voice activity detection (VAD) algorithms
Published: 2014
Keywords: Speech databases , ENGINEERING, Voice activity detection , Evaluation protocols

Queensland University of Technology

Dean, David ; Mason, Robert ; Vogt, Robert ; Sridharan , Sridha

Viewed: [[ro.stat.viewed]] Cited: [[ro.stat.cited]] Accessed: [[ro.stat.accessed]]

ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&rfr_id=info%3Asid%2FANDS&rft_id=info:doi10.4225/09/586f2a3faff49&rft.title=Noisy speech sequences for evaluation of voice activity detection (VAD) algorithms &rft.identifier=10.4225/09/586f2a3faff49&rft.publisher=Queensland University of Technology&rft.description=This dataset has been collected to provide a simulation of noisy speech in a wide variety of typical background noise conditions. This distribution contains the QUT-NOISE database and the code required to create the QUT-NOISE-TIMIT database from the QUT-NOISE database and a locally installed copy of the TIMIT database. The recordings, as described by Dean, Sridharan, Vogt & Mason in The QUT-NOISE-TIMIT corpus for the evaluation of voice activity detection algorithms, include 20 noise sessions of at least 30 minutes duration. Two separate noise recordings, separated by at least one day in all but the CAR scenario, were conducted in 10 separate locations over 5 separate common background noise scenarios. Locations included a cafe, home, street, car and reverb (closed indoor pool & carpark). Recordings were collected with a prosumer-quality Zoom H2 handheld stereo microphone recorder. This device was chosen as the quality of the background noise recordings should be higher than typical recording scenarios, allowing any expected recording quality to be easily synthesised. Each of the 20 noise sessions were recorded with the Zoom H2 set to record raw stereo WAV output with a sampling rate of 48 kHz, and 16 bits per sample. The recordings were conducted using the rear microphone pair of the Zoom H2, as the greater microphone angular separation (when compared to the front microphone pair) could potentially allow for more useful comparisons to be made between the two channels in future research. In order to calculate the room response in the reverberant CAR and REVERB scenarios, 10 second frequency sweeps were played with the studio monitor positioned several metres away from the microphone. Each reverberant session contained 12 frequency sweeps, with 6 before the main 30+ minute recording session and 6 after. Each of the noise sessions collected was manually labeled with the boundaries of the main 30+ minute recording session, as well as the rough locations of each individual frequency sweep in the reverberant sessions. In addition, the locations of any bad portions of data (such as microphone failure) were labeled to allow them to be avoided. &rft.creator=Dean, David &rft.creator=Mason, Robert &rft.creator=Vogt, Robert &rft.creator=Sridharan , Sridha &rft.date=2014&rft.edition=1&rft.coverage=153.552920,-26.777500 152.452799,-26.777500 152.452799,-28.037280 153.552920,-28.037280 153.552920,-26.777500&rft_rights=Copyright (c) 2011, Queensland University of Technology All rights reserved.&rft_rights=Creative Commons Attribution-NonCommercial-Share Alike 3.0 http://creativecommons.org/licenses/by-nc-sa/3.0/au/&rft_subject=Speech databases &rft_subject=ENGINEERING&rft_subject=Voice activity detection &rft_subject=Evaluation protocols &rft.type=dataset&rft.language=English Access the data

http://researchdatafin... https://www.qut.edu.au...

Cite Saved to MyRDA Save to MyRDA

Licence & Rights:

Non-Commercial Licence view details

Creative Commons Attribution-NonCommercial-Share Alike 3.0
http://creativecommons.org/licenses/by-nc-sa/3.0/au/

Access:

Other view details

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
* Neither the name of the Queensland University of Technology nor the
names of its contributors may be used to endorse or promote products
derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL QUEENSLAND UNIVERSITY OF TECHNOLOGY BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE
USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
DAMAGE.

Contact Information

Postal Address:
David Dean

[email protected]

Full description

This dataset has been collected to provide a simulation of noisy speech in a wide variety of typical background noise conditions. This distribution contains the QUT-NOISE database and the code required to create the QUT-NOISE-TIMIT database from the QUT-NOISE database and a locally installed copy of the TIMIT database.

The recordings, as described by Dean, Sridharan, Vogt & Mason in The QUT-NOISE-TIMIT corpus for the evaluation of voice activity detection algorithms, include 20 noise sessions of at least 30 minutes duration. Two separate noise recordings, separated by at least one day in all but the CAR scenario, were conducted in 10 separate locations over 5 separate common background noise scenarios. Locations included a cafe, home, street, car and reverb (closed indoor pool & carpark).

Recordings were collected with a prosumer-quality Zoom H2 handheld stereo microphone recorder. This device was chosen as the quality of the background noise recordings should be higher than typical recording scenarios, allowing any expected recording quality to be easily synthesised.

Each of the 20 noise sessions were recorded with the Zoom H2 set to record raw stereo WAV output with a sampling rate of 48 kHz, and 16 bits per sample. The recordings were conducted using the rear microphone pair of the Zoom H2, as the greater
microphone angular separation (when compared to the front microphone pair) could potentially allow for more useful comparisons to be made between the two channels in future research. In order to calculate the room response in the reverberant CAR and REVERB scenarios, 10 second frequency sweeps were played with the studio monitor positioned several metres away from the microphone. Each reverberant session contained 12 frequency sweeps, with 6 before the main 30+ minute recording session and 6 after. Each of the noise sessions collected was manually labeled with the boundaries of the main 30+ minute recording session, as well as the rough locations of each individual frequency sweep in the reverberant sessions. In addition, the locations of any bad portions of data (such as microphone failure) were
labeled to allow them to be avoided.

This dataset is part of a larger collection

Click to explore relationships graph

Subjects

Engineering | Evaluation protocols | Speech databases | Voice activity detection |

User Contributed Tags

Identifiers

DOI : 10.4225/09/586F2A3FAFF49
Local : 10378.3/8085/1018.15649