Brief description
A large non-factoid English consumer Question Answering (QA) dataset containing 51,000 pairs of consumer questions and their corresponding expert answers. This dataset is useful for bench-marking or training systems on more difficult real-world questions and responses which may contain spelling or formatting errors, or lexical gaps between consumer and expert vocabularies.By downloading this dataset, you agree to have obtained ethics approval from your institution.
Lineage: We collected data from posts and comments to subreddit /r/askdocs, published between July 10, 2013, and April 2, 2022, totalling 600,000 submissions (original posts) and 1,700,000 comments (replies). We generated question-answer pairs by taking the highest scoring answer from a verified medical expert to a Reddit question. Questions with only images are removed, all links are removed and authors are removed.
We provide two separate datasets in this collection and provide the following schemas.
MedRedQA - Reddit Medical Question and Answer pairs from /r/askdocs. CSV format.
i. the poster's question (Body)
ii. Title of the post
iii. The filtered answer from a verified physician comment (Response)
iv. Occupation indicated for verification status
v. Any PMCIDs found in the post
MedRedQA+PubMed - PubMed Enriched subset of MedRedQA. JSON format.
i. Question. The user's original question. The is equivalent to the Body field in MedRedQA
ii. Document: The abstract of the PubMed document (if it exists and contains an abstract) for that particular post. Note: it does not necessarily mean the answer references this document. But at least one other verified physician in the responses has mentioned that particular document.
iii. The filtered response. This is equivalent to the Response field in MedRedQA.
Available: 2024-05-01
Data time period: 2013-07-10 to 2022-04-02
Subjects
Applications in Health |
Applied Computing |
Artificial Intelligence |
Data Management and Data Science |
Information and Computing Sciences |
Information Retrieval and Web Search |
Natural Language Processing |
aacl |
consumer |
consumer question answering |
dataset |
medredqa |
question answering |
reddit |
User Contributed Tags
Login to tag this record with meaningful keywords to make it easier to discover