Using classification and K-means methods to predict breast cancer recurrence in gene expression data

Mohammadreza Sehhati, Mohammad Amin Tabatabaiefar, Ali Haji Gholami, Mohammad Sattari

DOI: 10.4103/jmss.jmss_117_21

Abstract


Background: Breast cancer is a type of cancer that starts in the breast tissue and affects about 10% of women at different stages of their lives. In this study, we applied a new method to predict recurrence in biological networks made from gene expression data. Method: The method includes the steps such as data collection, clustering, determining differentiating genes, and classification. The eight techniques consist of random forest, support vector machine and neural network, randomforest + k-means, hidden markov model, joint mutual information, neural network + k-means and suportvector machine + k-menas were implemented on 12172 genes and 200 samples. Results: Thirty genes were considered as differentiating genes which used for the classification. The results showed that random forest + k-means get better performance than other techniques. The two techniques including neural network + k-means and random forest + k-means performed better than other techniques in identifying high risk cases. Conclusion: Thirty of 12,172 genes are considered for classification that the use of clustering has improved the classification techniques performance.

Keywords


Classification, gene, K-means

Full Text:

PDF

References


Bagherian H, Haghjooy Javanmard S, Sharifi M, Sattari M. Using data mining techniques for predicting the survival rate of breast cancer patients: A review article. Tehran Univ Med J 2021;79:176-86.

Garcia-Murillas I, Schiavon G, Weigelt B, Ng C, Hrebien S, Cutts RJ, et al. Mutation tracking in circulating tumor DNA predicts relapse in early breast cancer. Sci Transl Med 2015;7:302ra133.

Yoo SM, Choi JH, Lee SY, Yoo NC. Applications of DNA microarray in disease diagnostics. J Microbiol Biotechnol 2009;19:635-46.

Perou CM, Sørlie T, Eisen MB, van de Rijn M, Jeffrey SS, Rees CA, et al. Molecular portraits of human breast tumours. Nature 2000;406:747-52.

Sørlie T, Perou CM, Tibshirani R, Aas T, Geisler S, Johnsen H, et al. Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proc Natl Acad Sci U S A 2001;98:10869-74.

Momenzadeh M, Sehhati M, Rabbani H. A novel feature selection method for microarray data classification based on hidden Markov model. J Biomed Inform 2019;95:103213.

Trapnell C, Hendrickson DG, Sauvageau M, Goff L, Rinn JL, Pachter L. Differential analysis of gene regulation at transcript resolution with RNA-seq. Nat Biotechnol 2013;31:46-53.

Bustamam A, Sarwinda D, Ardenaswari G. Texture and gene expression analysis of the mri brain in detection of alzheimer's disease. Journal of Artificial Intelligence and Soft Computing Research. 2018;8:111-20.

Kourou K, Rigas G, Papaloukas C, Mitsis M, Fotiadis DI. Cancer classification from time series microarray data through regulatory dynamic Bayesian networks. Comput Biol Med 2020;116:103577.

Ke W, Wu C, Wu Y, Xiong NN. A new filter feature selection based on criteria fusion for gene microarray data. IEEE Access 2018;6:61065-76.

Sato H, Ishida S, Toda K, Matsuda R, Hayashi Y, Shigetaka M, et al. New approaches to mechanism analysis for drug discovery using DNA microarray data combined with KeyMolnet. Curr Drug Discov Technol 2005;2:89-98.

Sehhati M, Mehridehnavi A, Rabbani H, Pourhossein M. Stable gene signature selection for prediction of breast cancer recurrence using joint mutual information. IEEE/ACM Trans Comput Biol Bioinform 2015;12:1440-8.

van 't Veer LJ, Dai H, van de Vijver MJ, He YD, Hart AA, Mao M, et al. Gene expression profiling predicts clinical outcome of breast cancer. Nature 2002;415:530-6.

Slodkowska EA, Ross JS. MammaPrint 70-gene signature: Another milestone in personalized medical care for breast cancer patients. Expert Rev Mol Diagn 2009;9:417-22.

Wittner BS, Sgroi DC, Ryan PD, Bruinsma TJ, Glas AM, Male A, et al. Analysis of the MammaPrint breast cancer assay in a predominantly postmenopausal cohort. Clin Cancer Res 2008;14:2988-93.

Lu X, Zhu Z, Peng X, Miao Q, Luo Y, Chen X. InFun: a community detection method to detect overlapping gene communities in biological network. Signal, Image and Video Processing 2021;15:681-6.

Pio G, Ceci1 M, Prisciandaro F, Malerba D. Exploiting causality in gene network reconstruction based on graph embedding. Mach Learn 2020;109:1231-79.

Qi Y. Random forest for bioinformatics. In: Ensemble Machine Learning. Boston, MA: Springer; 2012. p. 307-23.

Noble WS. What is a support vector machine? Nat Biotechnol 2006;24:1565-7.

Albawi S, Mohammed TA, Al-Zawi S. Understanding of a convolutional neural network. In: 2017 International Conference on Engineering and Technology (ICET). Turkey: IEEE; 2017. p. 1-6.

Chan PK, Stolfo SJ. On the accuracy of meta-learning for scalable data mining. J Intell Inf Syst 1997;8:5-28.

Sajjadi MS, Bachem O, Lucic M, Bousquet O, Gelly S. Assessing generative models via precision and recall. Arxiv 2018;31.

Momenzadeh M, Sehhati M, Rabbani H. Using hidden Markov model to predict recurrence of breast cancer based on sequential patterns in gene expression profiles. J Biomed Inform 2020;111:103570.

Bick U, Engel C, Krug B, Heindel W, Fallenberg EM, Rhiem K, et al. High-risk breast cancer surveillance with MRI: 10-year experience from the German consortium for hereditary breast and ovarian cancer. Breast Cancer Res Treat 2019;175:217-28.


Refbacks

  • There are currently no refbacks.


 

  https://e-rasaneh.ir/Certificate/22728

https://e-rasaneh.ir/

ISSN : 2228-7477