Full description
There are three datasets: address (dataset_address.csv), contact number (dataset_contact.csv), and date (dataset_date.csv). Our system, namely "Data-Scanner-4C", generates RegEx for three datasets respectively: address (RegEx_address.txt), contact number (RegEx_contact_number.txt), and date (RegEx_date.txt). The performance of RegEx are presented in Table 2, 3, and 4 in our paper. Please, refer to the readme file for more information.The datasets and the anaylysis results relates to our paper Shaochen Yu, Lei Han, Marta Indulska, Shazia Sadiq, and Gianluca Demartini. Human-in-the-loop Regular Expression Extraction for Single Column Format Inconsistency. In: 32nd ACM International World Wide Web Conference (TheWebConf 2023). Austin, Texas, USA, April 2023. https://doi.org/10.1145/3543507.3583515Issued: 2023
Subjects
User Contributed Tags
Login to tag this record with meaningful keywords to make it easier to discover
Other Information
Human-in-the-loop regular expression extraction for single column format inconsistency
local : UQ:1514d61
Yu, Shaochen, Han, Lei, Indulska, Marta, Sadiq, Shazia and Demartini, Gianluca (2023). Human-in-the-loop regular expression extraction for single column format inconsistency. WWW '23: ACM Web Conference 2023, Austin, TX, United States, 30 April - 4 May 2023. New York, NY, United States: ACM. doi: 10.1145/3543507.3583515
Research Data Collections
local : UQ:289097
Identifiers
- Local : RDM ID: 19f7bf50-a75c-11ed-94e0-a959ac1c5ac5
- DOI : 10.48610/0AB54E7