A Fast Algorithm for Exonic Regions Prediction in DNA Sequences

Hamidreza Saberkari, Mousa Shamsi, Hamed Heravi, Mohammad Hossein Sedaaghi


The main purpose of this paper is to introduce afast method for gene prediction in DNA sequences based on the period-3 property in exons. First, the symbolic DNA sequences are converted to digital signal using the EIIP method. Then, to reduce the effect of background noise in the period-3 spectrum, we use the Discrete Wavelet Transform (DWT) at three levels and apply it on the input digital signal. Finally, the Goertzel algorithm is used to extract period-3 components in the filtered DNA sequence. The proposed algorithm leads to increase the speed of process and therefor reduce the computational complexity. Detection of small size exons in DNA sequences, exactly, is another advantage of our algorithm. The proposed algorithm ability in exon prediction is compared with several existing methods at the nucleotide level using: (i) specificity - sensitivity values; (ii) Receiver Operating Curves (ROC); and (iii) area under ROC curve. Simulation results show that our algorithm increases the accuracy of exon detection relative to other methods for exon prediction.


DNA sequence; protein coding region; signal processing; exon; DWT; Goertzel algorithm.

