This multi-camera surveillance dataset, the SAIVT-SoftBio database, was captured from an existing surveillance network, to enable the evaluation of person recognition and re-identification models in a reallife multi-camera surveillance environment.
The dataset consists of 150 people moving through a building environment, recorded by eight surveillance cameras. Each camera captures data at 25 frames per second, at a resolution of 704 x 576 pixels, and is calibrated using Tsai’s method. The placement of cameras is a real-life surveillance setup, and cameras have been placed to provide maximal coverage of the space (with some overlap) and observation of the entrances to the building. The dataset was collected in an uncontrolled manner, so subjects can travel any route through the building. Thus, the vast majority of subjects will only pass through a subset of the camera network and that subset varies from person to person. This provides a highly unconstrained environment in which to test person re-identification models.
The frames are recorded from when the subject enters the building through one of the three main doorways visible in Camera 4, Camera 7 and Camera 5/8, until they leave observation either through exiting the building or entering a lecture theatre. Any frames which are significantly occluded, have been omitted.
XML files are used to store information about the database to enable different evaluations to be easily performed based on which subset of the dataset fits the desired criteria. For each subject, an XML file is used to summarise the camera views
and frame information which can be used to select subjects which fit the desired evaluation conditions (e.g. only subjects that exist in specific cameras or locations can be selected).
The overall dataset is also summarised in an XML file, which provides information on the camera calibration data for each subject.