Lung cancer has the world's highest
cancer- associated mortality rate, making
biomarker discovery for this
cancer a pressing issue. Machine learning approaches to identify molecular
biomarkers are not as prevalent as screening of potential
biomarkers by differential expression analysis. However, several differentially expressed
miRNAs involved in
cancer have been identified using this approach. The availability of The
Cancer Genome Atlas (TCGA) allows the use of machine-learning methods for the molecular profiling of
tumors. The present study employed empirical negative control
microRNAs (miRs) in
lung cancer to normalize
lung adenocarcinoma (LUAD) and lung
squamous cell carcinoma (LUSC) datasets from TCGA to model decision trees in order to classify
lung cancer status and subtype. The two primary classification models consisted of four
miRNAs for
lung cancer diagnosis and subtyping. hsa-miR-183 and
hsa-miR-135b were used to distinguish lung
tumors from normal samples taken from tissues adjacent to the
tumor site, and
hsa-miR-944 and hsa-miR-205 to further classify the
tumors into LUAD and LUSC major subtypes. Specific
cancer status classification models were also presented for each subtype.