Decoding the Sequence Requirements for Translation Initiation
Decoding the Sequence Requirements for Translation Initiation
Verhagen, B. M.; Liedtke, D.; Barbadilla-Martinez, L.; Alverado, C.; Petrychenko, V.; Swirski, M.; Muller, M.; Valen, E.; Puglisi, J.; de Ridder, J.; Fischer, N.; Tanenbaum, M. E.
AbstractAccurate selection of start codons by ribosomes is a fundamental determinant of proteome composition. Although the 'Kozak sequence'--an 8-nucleotide sequence flanking the start codon--has long been viewed as the primary determinant of initiation in eukaryotes, it fails to explain the large diversity of start codon usage across transcripts. Here we combine massively parallel reporter assays, bioinformatics, machine learning, single-molecule imaging and cryo-electron microscopy to define the 'extended translation initiation sequence (eTIS)', an ~80-nucleotide sequence surrounding the start codon that governs initiation efficiency. A deep-learning model trained on eTIS features accurately predicts translation initiation across transcripts. Unexpectedly, we find that the Kozak sequence is not optimal for initiation as is widely presumed, and we identify the origin of this discrepancy. eTIS nucleotides that promote efficient initiation are enriched in the human transcriptome and are evolutionarily conserved, underscoring their functional importance. Biophysical and structural analyses reveal that specific eTIS residues--including the key +6 position and residues in the mRNA entry and exit channel--engage ribosomal proteins, rRNA and initiation factors to promote start codon recognition by stabilizing the ribosome at the start codon and facilitating the structural transitions required for initiation. Finally, optimization of the eTIS markedly enhances translational fidelity and protein output from therapeutic mRNAs, highlighting its practical utility. Together, these findings redefine the sequence logic of translation initiation and establish a framework for precise control of protein expression.