Data

Learning sparse log-ratios for high-throughput sequencing data

La Trobe University
Elliott Rodriguez (Aggregated by) Fazel Almasi (Aggregated by)
Viewed: [[ro.stat.viewed]] Cited: [[ro.stat.cited]] Accessed: [[ro.stat.accessed]]
ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&rfr_id=info%3Asid%2FANDS&rft_id=info:doi10.26181/17197265.v1&rft.title=Learning sparse log-ratios for high-throughput sequencing data&rft.identifier=https://doi.org/10.26181/17197265.v1&rft.publisher=La Trobe University&rft.description=In the context of high-throughput sequencing (HTS) data, and compositional data (CoDa) more generally, an important class of biomarkers are the log-ratios between the input variables. However, identifying predictive log-ratio biomarkers from HTS data is a combinatorial optimization problem, which is computationally challenging. Existing methods are slow to run and scale poorly with the dimension of the input, which has limited their application to low- and moderate-dimensional metagenomic datasets. Building on recent advances from the field of deep learning, we develop CoDaCoRe, a novel learning algorithm that identifies sparse, interpretable, and predictive log-ratio biomarkers. Our algorithm exploits a continuous relaxation to approximate the underlying combinatorial optimization problem. This relaxation can then be optimized efficiently using the modern ML toolbox, in particular, gradient descent. As a result, CoDaCoRe runs several orders of magnitude faster than competing methods, all while achieving state-of-the-art performance in terms of predictive accuracy and sparsity. survey: https://www.surveymonkey.com/r/5H6CYPW&rft.creator=Elliott Rodriguez&rft.creator=Fazel Almasi&rft.date=2024&rft_rights=CC-BY-4.0&rft_subject=Statistics&rft_subject=Predictive log-ratio biomarkers&rft_subject=Combinatorial optimization&rft.type=dataset&rft.language=English Access the data

Licence & Rights:

Open Licence view details
CC-BY

CC-BY-4.0

Full description

In the context of high-throughput sequencing (HTS) data, and compositional data (CoDa) more generally, an important class of biomarkers are the log-ratios between the input variables. However, identifying predictive log-ratio biomarkers from HTS data is a combinatorial optimization problem, which is computationally challenging. Existing methods are slow to run and scale poorly with the dimension of the input, which has limited their application to low- and moderate-dimensional metagenomic datasets. Building on recent advances from the field of deep learning, we develop CoDaCoRe, a novel learning algorithm that identifies sparse, interpretable, and predictive log-ratio biomarkers. Our algorithm exploits a continuous relaxation to approximate the underlying combinatorial optimization problem. This relaxation can then be optimized efficiently using the modern ML toolbox, in particular, gradient descent. As a result, CoDaCoRe runs several orders of magnitude faster than competing methods, all while achieving state-of-the-art performance in terms of predictive accuracy and sparsity.


survey: https://www.surveymonkey.com/r/5H6CYPW


Issued: 2021-12-16

Created: 2021-12-15

This dataset is part of a larger collection

Click to explore relationships graph
Subjects

User Contributed Tags    

Login to tag this record with meaningful keywords to make it easier to discover

Identifiers