Ribosome profiling and mass spectroscopy have identified canonical and noncanonical translation
initiation codons (
TICs) that are upstream of the main translation initiation site and used to translate oncogenic
proteins. There have previously been conflicting reports about the patterns of
nucleotides that surround noncanonical
TICs. Here, we use a Kozak Similarity Score algorithm to find that nearly all of these
TICs have flanking
nucleotides closely matching the Kozak sequence. Remarkably, the
nucleotides flanking alternative noncanonical
TICs are frequently closer to the Kozak sequence than the
nucleotides flanking
TICs used to translate the gene's main
protein. Of note, the
5' untranslated region (
5'UTR) of
cancer-associated genes with an upstream
TIC tend to be significantly longer than the same region in genes not associated with
cancer. The presence of a longer-than-typical
5'UTR increases the likelihood of ribosome binding to upstream noncanonical
TICs, and may be a distinguishing feature of a number of genes overexpressed in
cancer. Noncanonical
TICs that are located in the
5'UTR, although thought by some to be disadvantageous and suppressed by evolution, may translate oncogenic
proteins because of their flanking
nucleotides.