Data

Synthetic nursing handover training and development data set - text files

Commonwealth Scientific and Industrial Research Organisation
Angel, Maricel ; Suominen, Hanna ; Zhou, Liyuan ; Hanlen, Leif
Viewed: [[ro.stat.viewed]] Cited: [[ro.stat.cited]] Accessed: [[ro.stat.accessed]]
ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&rfr_id=info%3Asid%2FANDS&rft_id=info:doi10.4225/08/58d097ee92e95&rft.title=Synthetic nursing handover training and development data set - text files&rft.identifier=https://doi.org/10.4225/08/58d097ee92e95&rft.publisher=Commonwealth Scientific and Industrial Research Organisation&rft.description=This is one of two collection records. Please see the link below for the other collection of associated audio files.\n\nBoth collections together comprise an open clinical dataset of three sets of 101 nursing handover records, very similar to real documents in Australian English. Each record consists of a patient profile, spoken free-form text document, written free-form text document, and written structured document.\n\nThis collection contains 3 sets of text documents.\n\nData Set 1 for Training and Development\n\nThe data set, released in June 2014, includes the following documents:\n\nFolder initialisation: Initialisation details for speech recognition using Dragon Medical 11.0 (i.e., i) DOCX for the written, free-form text document that originates from the Dragon software release and ii) WMA for the spoken, free-form text document by the RN)\nFolder 100profiles: 100 patient profiles (DOCX)\nFolder 101writtenfreetextreports: 101 written, free-form text documents (TXT)\nFolder 100x6speechrecognised: 100 speech-recognized, written, free-form text documents for six Dragon vocabularies (TXT)\nFolder 101informationextraction: 101 written, structured documents for information extraction that include i) the reference standard text, ii) features used by our best system, iii) form categories with respect to the reference standard and iv) form categories with respect to the our best information extraction system (TXT in CRF++ format).\n\nAn Independent Data Set 2\n\nThe aforementioned data set was supplemented in April 2015 with an independent set that was used as a test set in the CLEFeHealth 2015 Task 1a on clinical speech recognition and can be used as a validation set in the CLEFeHealth 2016 Task 1 on handover information extraction. Hence, when using this set, please avoid its repeated use in evaluation – we do not wish to overfit to these data sets.\n\nThe set released in April 2015 consists of 100 patient profiles (DOCX), 100 written, and 100 speech-recognized, written, free-form text documents for the Dragon vocabulary of Nursing (TXT). The set released in November 2015 consists of the respective 100 written free-form text documents (TXT) and 100 written, structured documents for information extraction.\n\nAn Independent Data Set 3\n\nFor evaluation purposes, the aforementioned data sets were supplemented in April 2016 with an independent set of another 100 synthetic cases. \n\nLineage: Data creation included the following steps: generation of patient profiles; creation of written, free form text documents; development of a structured handover form, using this form and the written, free-form text documents to create written, structured documents; creation of spoken, free-form text documents; using a speech recognition engine with different vocabularies to convert the spoken documents to written, free-form text; and using an information extraction system to fill out the handover form from the written, free-form text documents.\n\nSee Suominen et al (2015) in the links below for a detailed description and examples.&rft.creator=Angel, Maricel &rft.creator=Suominen, Hanna &rft.creator=Zhou, Liyuan &rft.creator=Hanlen, Leif &rft.date=2017&rft.edition=v1&rft.relation=https://doi.org/10.2196/medinform.4321&rft_rights=Creative Commons Attribution-Noncommercial-No Derivatives 4.0 International Licence https://creativecommons.org/licenses/by-nc-nd/4.0/&rft_rights=Data is accessible online and may be reused in accordance with licence conditions&rft_rights=All Rights (including copyright) NICTA 2014.&rft_subject=nursing records&rft_subject=patient handoff&rft_subject=records as topic&rft_subject=speech recognition software&rft_subject=Natural language processing&rft_subject=Artificial intelligence&rft_subject=INFORMATION AND COMPUTING SCIENCES&rft.type=dataset&rft.language=English Access the data

Licence & Rights:

Non-Derivative Licence view details
CC-BY-NC-ND

Creative Commons Attribution-Noncommercial-No Derivatives 4.0 International Licence
https://creativecommons.org/licenses/by-nc-nd/4.0/

Data is accessible online and may be reused in accordance with licence conditions

All Rights (including copyright) NICTA 2014.

Access:

Open view details

Accessible for free

Contact Information



Brief description

This is one of two collection records. Please see the link below for the other collection of associated audio files.

Both collections together comprise an open clinical dataset of three sets of 101 nursing handover records, very similar to real documents in Australian English. Each record consists of a patient profile, spoken free-form text document, written free-form text document, and written structured document.

This collection contains 3 sets of text documents.

Data Set 1 for Training and Development

The data set, released in June 2014, includes the following documents:

Folder initialisation: Initialisation details for speech recognition using Dragon Medical 11.0 (i.e., i) DOCX for the written, free-form text document that originates from the Dragon software release and ii) WMA for the spoken, free-form text document by the RN)
Folder 100profiles: 100 patient profiles (DOCX)
Folder 101writtenfreetextreports: 101 written, free-form text documents (TXT)
Folder 100x6speechrecognised: 100 speech-recognized, written, free-form text documents for six Dragon vocabularies (TXT)
Folder 101informationextraction: 101 written, structured documents for information extraction that include i) the reference standard text, ii) features used by our best system, iii) form categories with respect to the reference standard and iv) form categories with respect to the our best information extraction system (TXT in CRF++ format).

An Independent Data Set 2

The aforementioned data set was supplemented in April 2015 with an independent set that was used as a test set in the CLEFeHealth 2015 Task 1a on clinical speech recognition and can be used as a validation set in the CLEFeHealth 2016 Task 1 on handover information extraction. Hence, when using this set, please avoid its repeated use in evaluation – we do not wish to overfit to these data sets.

The set released in April 2015 consists of 100 patient profiles (DOCX), 100 written, and 100 speech-recognized, written, free-form text documents for the Dragon vocabulary of Nursing (TXT). The set released in November 2015 consists of the respective 100 written free-form text documents (TXT) and 100 written, structured documents for information extraction.

An Independent Data Set 3

For evaluation purposes, the aforementioned data sets were supplemented in April 2016 with an independent set of another 100 synthetic cases.

Lineage: Data creation included the following steps: generation of patient profiles; creation of written, free form text documents; development of a structured handover form, using this form and the written, free-form text documents to create written, structured documents; creation of spoken, free-form text documents; using a speech recognition engine with different vocabularies to convert the spoken documents to written, free-form text; and using an information extraction system to fill out the handover form from the written, free-form text documents.

See Suominen et al (2015) in the links below for a detailed description and examples.

Available: 2017-03-21

Subjects

User Contributed Tags    

Login to tag this record with meaningful keywords to make it easier to discover