Name: MedRedQA
Published: 2024-05-01

Full description

A large non-factoid English consumer Question Answering (QA) dataset containing 51,000 pairs of consumer questions and their corresponding expert answers. This dataset is useful for bench-marking or training systems on more difficult real-world questions and responses which may contain spelling or formatting errors, or lexical gaps between consumer and expert vocabularies.

By downloading this dataset, you agree to have obtained ethics approval from your institution.
Lineage: We collected data from posts and comments to subreddit /r/askdocs, published between July 10, 2013, and April 2, 2022, totalling 600,000 submissions (original posts) and 1,700,000 comments (replies). We generated question-answer pairs by taking the highest scoring answer from a verified medical expert to a Reddit question. Questions with only images are removed, all links are removed and authors are removed.

We provide two separate datasets in this collection and provide the following schemas.
MedRedQA - Reddit Medical Question and Answer pairs from /r/askdocs. CSV format.
i. the poster's question (Body)
ii. Title of the post
iii. The filtered answer from a verified physician comment (Response)
iv. Occupation indicated for verification status
v. Any PMCIDs found in the post

MedRedQA+PubMed - PubMed Enriched subset of MedRedQA. JSON format.
i. Question. The user's original question. The is equivalent to the Body field in MedRedQA
ii. Document: The abstract of the PubMed document (if it exists and contains an abstract) for that particular post. Note: it does not necessarily mean the answer references this document. But at least one other verified physician in the responses has mentioned that particular document.
iii. The filtered response. This is equivalent to the Response field in MedRedQA.

Available: 2024-05-01

Data time period: 2013-07-10 to 2022-04-02

Subjects

User Contributed Tags

Login to tag this record with meaningful keywords to make it easier to discover

Identifiers

DOI : 10.25919/YN7X-9148
Handle : 102.100.100/634999
URL : data.csiro.au/collection/csiro:62454

MedRedQA

Licence & Rights:

Access:

Contact Information

Full description

This dataset is part of a larger collection

User Contributed Tags

MedRedQA

Licence & Rights:

Access:

Contact Information

Full description

This dataset is part of a larger collection

Related Publications

Related Grants and Projects

User Contributed Tags