Data

SEQUENTIAL STORYTELLING IMAGE DATASET (SSID)

The University of Western Australia
Aljawy, Zainy M. Malakan ; Anwar, Saeed ; Hassan, Ghulam Mubashar ; Mian, Ajmal
Viewed: [[ro.stat.viewed]] Cited: [[ro.stat.cited]] Accessed: [[ro.stat.accessed]]
ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&rfr_id=info%3Asid%2FANDS&rft_id=info:doi10.21227/dbr9-dq51&rft.title=SEQUENTIAL STORYTELLING IMAGE DATASET (SSID)&rft.identifier=10.21227/dbr9-dq51&rft.publisher=IEEE DataPort&rft.description=Visual storytelling refers to the manner of describing a set of images rather than a single image, also known as multi-image captioning. Visual Storytelling Task (VST) takes a set of images as input and aims to generate a coherent story relevant to the input images. In this dataset, we bridge the gap and present a new dataset for expressive and coherent story creation. We present the Sequential Storytelling Image Dataset (SSID), consisting of open-source video frames accompanied by story-like annotations. In addition, we provide four annotations (i.e., stories) for each set of five images. The image sets are collected manually from publicly available videos in three domains: documentaries, lifestyle, and movies, and then annotated manually using Amazon Mechanical Turk. In summary, SSID dataset is comprised of 17,365 images, which resulted in a total of 3,473 unique sets of five images. Each set of images is associated with four ground truths, resulting in a total of 13,892 unique ground truths (i.e., written stories). And each ground truth is composed of five connected sentences written in the form of a story.&rft.creator=Aljawy, Zainy M. Malakan &rft.creator=Anwar, Saeed &rft.creator=Hassan, Ghulam Mubashar &rft.creator=Mian, Ajmal &rft.date=2023&rft.relation=http://research-repository.uwa.edu.au/en/publications/e0c68472-e9f4-4649-9469-a4faa591499e&rft.type=dataset&rft.language=English Access the data

Access:

Open

Full description

Visual storytelling refers to the manner of describing a set of images rather than a single image, also known as multi-image captioning. Visual Storytelling Task (VST) takes a set of images as input and aims to generate a coherent story relevant to the input images. In this dataset, we bridge the gap and present a new dataset for expressive and coherent story creation. We present the Sequential Storytelling Image Dataset (SSID), consisting of open-source video frames accompanied by story-like annotations. In addition, we provide four annotations (i.e., stories) for each set of five images. The image sets are collected manually from publicly available videos in three domains: documentaries, lifestyle, and movies, and then annotated manually using Amazon Mechanical Turk. In summary, SSID dataset is comprised of 17,365 images, which resulted in a total of 3,473 unique sets of five images. Each set of images is associated with four ground truths, resulting in a total of 13,892 unique ground truths (i.e., written stories). And each ground truth is composed of five connected sentences written in the form of a story.

Notes

External Organisations
King Fahd University of Petroleum and Minerals; Umm Al-Qura University
Associated Persons
Zainy M. Malakan Aljawy (Creator)Saeed Anwar (Contributor)

Issued: 2023-07-10

This dataset is part of a larger collection

Click to explore relationships graph

User Contributed Tags    

Login to tag this record with meaningful keywords to make it easier to discover

Identifiers