Regulatory processes at the
RNA transcript level play a crucial role in generating transcriptome diversity and
proteome composition in human cells, impacting both physiological and pathological states. This study introduces FLIBase (www.FLIBase.org), a specialized database that focuses on annotating full-length
isoforms using long-read sequencing techniques. We collected and integrated long-read (351 samples) and short-read (12 469 samples)
RNA sequencing data from diverse normal and cancerous human tissues and cells. The current version of FLIBase comprises a total of 983 789 full-length spliced
isoforms, identified through long-read sequences and verified using short-read exon-exon splice junctions. Of these, 188 248
isoforms have been annotated, while 795 541
isoforms remain unannotated. By overcoming the limitations of short-read
RNA sequencing methods, FLIBase provides an accurate and comprehensive representation of full-length transcripts. These comprehensive annotations empower researchers to undertake various downstream analyses and investigations. Importantly, FLIBase exhibits a significant advantage in identifying a substantial number of previously unannotated
isoforms and
tumor-specific
RNA transcripts. These
tumor-specific
RNA transcripts have the potential to serve as a source of immunogenic recurrent neoantigens. This remarkable discovery holds tremendous promise for advancing the development of tailored
RNA-based diagnostic and therapeutic strategies for various types of human
cancer.