Data
Viewed: [[ro.stat.viewed]] Cited: [[ro.stat.cited]] Accessed: [[ro.stat.accessed]]
ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&rfr_id=info%3Asid%2FANDS&rft_id=info:doi10.48610/0ab54e7&rft.title=UQ Single Column Format Inconsistency Datasets&rft.identifier=RDM ID: 19f7bf50-a75c-11ed-94e0-a959ac1c5ac5&rft.publisher=The University of Queensland&rft.description=There are three datasets: address (dataset_address.csv), contact number (dataset_contact.csv), and date (dataset_date.csv). Our system, namely Data-Scanner-4C, generates RegEx for three datasets respectively: address (RegEx_address.txt), contact number (RegEx_contact_number.txt), and date (RegEx_date.txt). The performance of RegEx are presented in Table 2, 3, and 4 in our paper. Please, refer to the readme file for more information.The datasets and the anaylysis results relates to our paper Shaochen Yu, Lei Han, Marta Indulska, Shazia Sadiq, and Gianluca Demartini. Human-in-the-loop Regular Expression Extraction for Single Column Format Inconsistency. In: 32nd ACM International World Wide Web Conference (TheWebConf 2023). Austin, Texas, USA, April 2023. https://doi.org/10.1145/3543507.3583515&rft.creator=Associate Professor Gianluca Demartini&rft.creator=Associate Professor Gianluca Demartini&rft.creator=Dr Jiechen Xu&rft.creator=Dr Jiechen Xu&rft.creator=Dr Lei Han&rft.creator=Dr Lei Han&rft.creator=Mr Shaochen Yu&rft.creator=Mr Shaochen Yu&rft.creator=Mr Shaoyang Fan&rft.creator=Mr Shaoyang Fan&rft.creator=Ms Tianwa Chen&rft.creator=Ms Tianwa Chen&rft.creator=Professor Shazia Sadiq&rft.creator=Professor Shazia Sadiq&rft.date=2023&rft_rights= https://guides.library.uq.edu.au/deposit-your-data/license-reuse-data-agreement&rft_subject=eng&rft_subject=Database Management&rft_subject=INFORMATION AND COMPUTING SCIENCES&rft_subject=INFORMATION SYSTEMS&rft.type=dataset&rft.language=English Access the data

Contact Information

g.demartini@uq.edu.au
School of Information Technology and Electrical Engineering

Full description

There are three datasets: address (dataset_address.csv), contact number (dataset_contact.csv), and date (dataset_date.csv). Our system, namely "Data-Scanner-4C", generates RegEx for three datasets respectively: address (RegEx_address.txt), contact number (RegEx_contact_number.txt), and date (RegEx_date.txt). The performance of RegEx are presented in Table 2, 3, and 4 in our paper. Please, refer to the readme file for more information.The datasets and the anaylysis results relates to our paper Shaochen Yu, Lei Han, Marta Indulska, Shazia Sadiq, and Gianluca Demartini. Human-in-the-loop Regular Expression Extraction for Single Column Format Inconsistency. In: 32nd ACM International World Wide Web Conference (TheWebConf 2023). Austin, Texas, USA, April 2023. https://doi.org/10.1145/3543507.3583515

Issued: 2023

This dataset is part of a larger collection

Click to explore relationships graph
Subjects

User Contributed Tags    

Login to tag this record with meaningful keywords to make it easier to discover

Other Information
Human-in-the-loop regular expression extraction for single column format inconsistency

local : UQ:1514d61

Yu, Shaochen, Han, Lei, Indulska, Marta, Sadiq, Shazia and Demartini, Gianluca (2023). Human-in-the-loop regular expression extraction for single column format inconsistency. WWW '23: ACM Web Conference 2023, Austin, TX, United States, 30 April - 4 May 2023. New York, NY, United States: ACM. doi: 10.1145/3543507.3583515

Research Data Collections

local : UQ:289097

Identifiers