Data
Viewed: [[ro.stat.viewed]] Cited: [[ro.stat.cited]] Accessed: [[ro.stat.accessed]]
ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&rfr_id=info%3Asid%2FANDS&rft_id=info:doi10.48610/0ab54e7&rft.title=UQ Single Column Format Inconsistency Datasets&rft.identifier=RDM ID: 19f7bf50-a75c-11ed-94e0-a959ac1c5ac5&rft.publisher=The University of Queensland&rft.description=There are three datasets: address (dataset_address.csv), contact number (dataset_contact.csv), and date (dataset_date.csv). Our system, namely Data-Scanner-4C, generates RegEx for three datasets respectively: address (RegEx_address.txt), contact number (RegEx_contact_number.txt), and date (RegEx_date.txt). The performance of RegEx are presented in Table 2, 3, and 4 in our paper. Please, refer to the readme file for more information.The datasets and the anaylysis results relates to our paper Shaochen Yu, Lei Han, Marta Indulska, Shazia Sadiq, and Gianluca Demartini. Human-in-the-loop Regular Expression Extraction for Single Column Format Inconsistency. In: 32nd ACM International World Wide Web Conference (TheWebConf 2023). Austin, Texas, USA, April 2023. https://doi.org/10.1145/3543507.3583515&rft.creator=Associate Professor Gianluca Demartini&rft.creator=Associate Professor Gianluca Demartini&rft.creator=Dr Jiechen Xu&rft.creator=Dr Jiechen Xu&rft.creator=Dr Lei Han&rft.creator=Dr Lei Han&rft.creator=Mr Shaochen Yu&rft.creator=Mr Shaochen Yu&rft.creator=Mr Shaoyang Fan&rft.creator=Mr Shaoyang Fan&rft.creator=Ms Tianwa Chen&rft.creator=Ms Tianwa Chen&rft.creator=Professor Shazia Sadiq&rft.creator=Professor Shazia Sadiq&rft.date=2023&rft_rights= https://guides.library.uq.edu.au/deposit-your-data/license-reuse-data-agreement&rft_subject=eng&rft_subject=Database Management&rft_subject=INFORMATION AND COMPUTING SCIENCES&rft_subject=INFORMATION SYSTEMS&rft.type=dataset&rft.language=English Access the data

Contact Information

[email protected]
School of Information Technology and Electrical Engineering

Full description

There are three datasets: address (dataset_address.csv), contact number (dataset_contact.csv), and date (dataset_date.csv). Our system, namely "Data-Scanner-4C", generates RegEx for three datasets respectively: address (RegEx_address.txt), contact number (RegEx_contact_number.txt), and date (RegEx_date.txt). The performance of RegEx are presented in Table 2, 3, and 4 in our paper. Please, refer to the readme file for more information.The datasets and the anaylysis results relates to our paper Shaochen Yu, Lei Han, Marta Indulska, Shazia Sadiq, and Gianluca Demartini. Human-in-the-loop Regular Expression Extraction for Single Column Format Inconsistency. In: 32nd ACM International World Wide Web Conference (TheWebConf 2023). Austin, Texas, USA, April 2023. https://doi.org/10.1145/3543507.3583515

Issued: 2023

This dataset is part of a larger collection

Subjects

User Contributed Tags    

Login to tag this record with meaningful keywords to make it easier to discover

Other Information
Identifiers