Lung cancer is one of the leading causes of
cancer related deaths. Early diagnosis of
lung cancer using automatic feature selection from large number of features is a challenging task. Conventionally,
cancer diagnosis approaches use physical features that appear in later stages, while harmful effects have already been occurred due to abnormal somatic mutations. In order to extract useful novel patterns to efficiently predict
cancer at early stages, we analyzed
lung cancer related mutated genes that reveal useful information in
protein amino acid sequences. For this, we developed a new evolutionary learning technique with biologically inspired multi-gene genetic programming algorithm using discriminant information of
protein amino acids. The proposed model efficiently selects 23 discriminant features out of 1500 features. Then it combines the selected features and related primitive functions optimally for prediction of
lung cancer. Hence, an efficient predictive model is constructed that helps in understanding the complex heterogeneous nature of
lung cancer. The proposed system achieved area under ROC curve and accuracy values of 98.79% and 95.67%, respectively outperforming related
lung cancer prediction approaches.