A Hybrid Feature Subset Selection Algorithm for Analysis of High Correlation Proteomic Data

Hussain Montazery Kordy



Pathological changes within an organ can be reflected as proteomic patterns in biological fluids such as plasma, serum, and urine. The surface-enhanced laser desorption and ionization time-of-flight mass spectrometry (SELDI-TOF MS) has been used to generate proteomic profiles from biological fluids. Mass spectrometry yields redundant noisy data that the most data points are irrelevant features for differentiating between cancer and normal cases. In this paper, we have proposed a hybrid feature subset selection algorithm based on maximum-discrimination and minimum-correlation (MDMC) coupled with a peak scoring criteria. Our algorithm has been applied to two independent SELDI-TOF MS datasets of ovarian cancer obtained from the NCI-FDA clinical proteomics databank. The proposed algorithm has used to extract a set of proteins as potential biomarkers in each dataset. We applied the linear discriminate analysis (LDA) to identify the important biomarkers. The selected biomarkers have been able to successfully diagnose ovarian cancer patients from non-cancer control group with accuracy of 100%, sensitivity of 100%, and specificity of 100% in the two datasets. The hybrid algorithm has the advantage that increase reproducibility of selected biomarkers and able to find a small set of proteins with high discrimination power.


Proteomics;Feature subset selection;Correlation-based weight function;Peak scoring;Biomarker;Classification

Full Text:



  • There are currently no refbacks.