Data
Viewed: [[ro.stat.viewed]] Cited: [[ro.stat.cited]] Accessed: [[ro.stat.accessed]]
ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&rfr_id=info%3Asid%2FANDS&rft_id=info:doi10.25451/flinders.19770742.v3&rft.title=Bacillus Unannotated Protvec Model&rft.identifier=https://doi.org/10.25451/flinders.19770742.v3&rft.publisher=Flinders University&rft.description=Protvec model  trained using 425,000 sequences from the Genome Taxonomy Database (GTDB). Sequences were dereplicated at 70% using CDHIT and filtered to remove sequences containing 'X', sequences shorter than 30 amino acids and sequences longer than 1024 amino acids.  Training used a vector size of 100 and a context size of 25 to produce a dictionary object containing a 100-dimensional vector for each 3-mer present in the training data.  Model is stored as a .pkl file which can be imported using the Python pickle module.  &rft.creator=James G Mitchell&rft.creator=Jody C. McKerral&rft.creator=Robert Edwards&rft.creator=Susie Grigson&rft.date=2022&rft_rights=NON-COMMERCIAL-REUSE-ONLY-(CC-BY-NC)&rft_subject=Protvec&rft_subject=Sequence embedding&rft_subject=Bioinformatics&rft_subject=Bioinformatics and computational biology not elsewhere classified&rft.type=dataset&rft.language=English Access the data

Licence & Rights:

Other view details
Non-commercial-reuse-only

NON-COMMERCIAL-REUSE-ONLY-(CC-BY-NC)

Full description

Protvec model  trained using 425,000 sequences from the Genome Taxonomy Database (GTDB). Sequences were dereplicated at 70% using CDHIT and filtered to remove sequences containing 'X', sequences shorter than 30 amino acids and sequences longer than 1024 amino acids. 


Training used a vector size of 100 and a context size of 25 to produce a dictionary object containing a 100-dimensional vector for each 3-mer present in the training data. 


Model is stored as a .pkl file which can be imported using the Python pickle module.  

Issued: 2022-05-16

Created: 2022-05-21

This dataset is part of a larger collection

Click to explore relationships graph
Subjects

User Contributed Tags    

Login to tag this record with meaningful keywords to make it easier to discover

Identifiers