Data

Handwritten synthetic dataset from the IAM

RMIT University, Australia
Hiqmat Nisa (Aggregated by)
Viewed: [[ro.stat.viewed]] Cited: [[ro.stat.cited]] Accessed: [[ro.stat.accessed]]
ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&rfr_id=info%3Asid%2FANDS&rft_id=info:doi10.25439/rmt.24309730.v1&rft.title=Handwritten synthetic dataset from the IAM&rft.identifier=https://doi.org/10.25439/rmt.24309730.v1&rft.publisher=RMIT University, Australia&rft.description=This dataset was generated employing a technique of randomly crossing out words from the IAM database, utilizing several types of strokes. The ratio of cross-out words to regular words in handwritten documents can vary greatly depending on the document and context. However, typically, the number of cross-out words is small compared with regular words. To ensure a realistic ratio of regular to cross-out words in our synthetic database, 30% of samples from the IAM training set were selected. First, the bounding box of each word in a line was detected. The bounding box covers the core area of the word. Then, at random, a word is crossed out within the core area. Each line contains a randomly struck-out word at a different position. The annotation of these struck-out words was replaced with the symbol #. The folder has:s-s0 imagesSyn-trainset Syn-validsetSyn_IAM_testsetThe transcription files are in the format of Filename, threshold label of handwritten lines-s0-0,157 A # to stop Mr. Gaitskell fromCite the below work if you have used this dataset:A deep learning approach to handwritten text recognition in the presence of struck-out texthttps://ieeexplore.ieee.org/document/8961024 &rft.creator=Hiqmat Nisa&rft.date=2023&rft_rights=ODC-By&rft_subject=Handwriting Recognition&rft_subject=HTR&rft_subject=cross-out text&rft_subject=struck-outs&rft_subject=Synthetic-IAM&rft_subject=Data engineering and data science&rft_subject=Machine learning not elsewhere classified&rft.type=dataset&rft.language=English Access the data

Licence & Rights:

view details

ODC-By

Access:

Other

Full description

This dataset was generated employing a technique of randomly crossing out words from the IAM database, utilizing several types of strokes. The ratio of cross-out words to regular words in handwritten documents can vary greatly depending on the document and context. However, typically, the number of cross-out words is small compared with regular words. To ensure a realistic ratio of regular to cross-out words in our synthetic database, 30% of samples from the IAM training set were selected. First, the bounding box of each word in a line was detected. The bounding box covers the core area of the word. Then, at random, a word is crossed out within the core area. Each line contains a randomly struck-out word at a different position. The annotation of these struck-out words was replaced with the symbol #.

The folder has:
s-s0 images
Syn-trainset
Syn-validset
Syn_IAM_testset
The transcription files are in the format of
Filename, threshold label of handwritten line
s-s0-0,157 A # to stop Mr. Gaitskell from

Cite the below work if you have used this dataset:
"A deep learning approach to handwritten text recognition in the presence of struck-out text"
https://ieeexplore.ieee.org/document/8961024


Issued: 2023-10-14

Created: 2023-10-14

This dataset is part of a larger collection

Click to explore relationships graph
Subjects

User Contributed Tags    

Login to tag this record with meaningful keywords to make it easier to discover

Identifiers