Science Cast

Contrastive pre-training for sequence based genomics models

librarianJune 12, 2024 5:41pm

Views (32)
Comments (0)

Export Citation

Voice is AI-generated

Connected to paperThis paper is a preprint and has not been certified by peer review

Contrastive pre-training for sequence based genomics models

bioRxivPDFJune 12, 2024 12:00am

Authors

Sokolova, K.; Chen, K. M.; Troyanskaya, O.

Abstract

Motivation: In recent years deep learning has become one of the central approaches in a number of applications, including many tasks in genomics. However, as models grow in depth and complexity, they either require more data or a strategic initialization technique to improve performance. Results: In this project, we introduce cGen, a novel unsupervised, model-agnostic contrastive pre-training method for sequence-based models. cGen can be used before training to initialize weights, reducing the size of the dataset needed. It works through learning the intrinsic features of the reference genome and makes no assumptions on the underlying structure. We show that the embeddings produced by the unsupervised model are already informative for gene expression prediction and that the sequence features provide a meaningful clustering. We demonstrate that cGen improves model performance in various sequence-based deep learning applications, such as chromatin profiling prediction and gene expression. Our findings suggest that using cGen, particularly in areas constrained by data availability, could improve the performance of deep learning genomic models without the need to modify the model architecture.

TwitterandLinkedIn

0 comments

Add comment

Contrastive pre-training for sequence based genomics models

Contrastive pre-training for sequence based genomics models

AI-powered Paper ChatBeta

AI-powered Paper ChatBeta

0 comments