SynthECG: Python Framework and ECG Image Datasets for Digitization, Lead Detection, and Waveform Segmentation

Masoud Rahimi, Reza Karbasi, Abdol Hossein Vahabie

DOI:

Abstract


Background: Digitizing electrocardiogram (ECG) images into structured time-series data is critical for clinical analysis, but it remains challenging due to the lack of standardized datasets, especially under realistic scenarios like overlapping waveforms. Methods: We introduce SynthECG, an open?source Python framework to generate four synthetic ECG datasets tailored for deep learning tasks, including ECG digitization, YOLO-based lead and lead name detection, and U-Net-based waveform segmentation. The framework supports customizable parameters (e.g., dataset size, lead layout, and visual style) and allows generating up to 21,799 images for multi-lead datasets and 261,588 for single-lead segmentation. Notably, it introduces a novel mechanism to simulate overlapping waveforms from adjacent leads while preserving clean segmentation masks. Results: Using our framework, we generated four open-access datasets: (1) 2000 ECG images in various lead configurations paired with time?series signals for ECG digitization, (2) 2000 ECG images in various lead configurations with YOLO-format annotations for detecting lead regions and lead names, (3) 20,000 cropped single?lead images with pixel-level segmentation masks (normal variant), and (4) 102 cropped single?lead images with overlapping waveforms from adjacent leads (overlapping variant). We validated these datasets through two case studies: digitization using a non–ML algorithm (mean squared error: 0.002, -: 0.93, signal-to-noise ratio [SNR]: 7.36 dB, SNRmed : 37.86 dB) and lead/name detection using YOLOv8. Conclusions: Our open-source framework enables the generation of large-scale, customizable ECG image datasets to support key deep learning-based tasks, including digitization under normal and overlapping conditions, as well as lead region and lead name detection. The full datasets and code are publicly available at: https://doi.org/10.5281/zenodo.15484519 and https://github.com/rezakarbasi/ ecg-image-and-signal-dataset.

Keywords


Deep learning, ECG digitization, electrocardiogram (ECG), lead detection, synthetic data, waveform segmentation

Full Text:

PDF

Refbacks

  • There are currently no refbacks.


 

  https://e-rasaneh.ir/Certificate/22728

https://e-rasaneh.ir/

ISSN : 2228-7477