Based on multiple bioinformatics methods and machine learning techniques, this study was designed to explore potential hub genes of
gastric cancer with a diagnostic value. The novel
biomarkers were detected through multiple databases of
gastric cancer-related genes. The NCBI Gene Expression Omnibus (GEO) database was used to obtain gene expression files. Three hub genes (ESRRG, ATP4A, and ATP4B) were detected through a combination of weighted gene co-expression network analysis (WGCNA), gene-gene interaction network analysis, and supervised feature selection method. GEPIA2 was used to verify the differences in the expression levels of the hub genes in normal and
cancer tissues in the
RNA-seq levels of Genotype-Tissue Expression (GTEx) and The
Cancer Genome Atlas (TCGA) databases. The objectivity of potential hub genes was also verified by immunohistochemistry in the Human
Protein Atlas (HPA) database and
transcription factor-hub gene regulatory network. Machine learning (ML) methods including data pre-processing, model selection and cross-validation, and performance evaluation were examined on the hub-gene expression profiles in five Gene Expression Omnibus datasets and verified on a GEO external validation (EV) dataset. Six supervised learning models (support vector machine, random forest, k-nearest neighbors, neural network, decision tree, and eXtreme Gradient Boosting) and one semi-supervised learning model (label spreading) were established to evaluate the diagnostic value of
biomarkers. Among the six supervised models, the support vector machine (SVM) algorithm was the most effective one according to calculated performance metrics, including 0.93 and 0.99 area under the curve (AUC) scores on the test and external validation datasets, respectively. Furthermore, the semi-supervised model could also successfully learn and predict sample types, achieving a 0.986 AUC score on the EV dataset, even when 10% samples in the five GEO datasets were labeled. In conclusion, three hub genes (ATP4A, ATP4B, and ESRRG) closely related to
gastric cancer were mined, based on which the ML diagnostic model of
gastric cancer was conducted.