Speech enhancement performance of neural networks relies on the richness and completeness of training data, that show performance degradation for burst unseen noise. To solve this problem, we propose an attention-based noise-ware framework, that works feasibly with arbitrary single-channel speech enhancement model to enhance the realtime noise feature extraction ability of those models. Multi-head attention is applied to generate the predicted noise information by combining multidimensional noise bases derived from environment-related noise clusters in memory library. The predicted noise features are adaptively embedded at the input of traditional models to bring better robustness and awareness of unseen background noise. Mel-scale weighted loss function is proposed to combine the network training process with human auditory perception. Experiments prove that our framework outperforms baselines (LSTM and CRN) in both seen and unseen noisy scenes with various SNR.