This study presents a transformer attention model with stacked multi-head attention layer designed to remove noise from electroencephalogram (EEG) signals, specifically addressing the problem of signal distortion caused by artifacts such as ocular and muscular noise. This is a crucial step in improving the efficacy of EEG, for disease diagnostics and BCI applications. Deep learning (DL) models have been increasingly employed for denoising EEG data in recent years, demonstrating comparable performance to classical approaches. However, the current models have been unsuccessful in capturing temporal long-term dependencies to efficiently eliminating ocular and muscular abnormalities. In this study, we address those challenges faced in the DL models by introducing multiple multi-head attention layers in the transformer model, which surpass the performance measures of previous works in EEGdenoiseNet dataset.