The rise of voice conversion (VC), i.e., audio deepfakes, poses serious risks. While many detections have been developed, current methods focus on identifying artifacts in deepfake samples. As deepfake technology advances, the question arises: can these methods detect future deepfakes that may contain fewer artifacts? Furthermore, can the models learn features not tied to deepfake imperfections?To address these concerns, we introduce the Balanced Environment Audio-Deepfake Reevaluation (BEAR) protocol, creating a balanced setting with similar artifacts or noise in both genuine and deepfake samples. Utilizing BEAR as the evaluation setting, we observe a significant performance drop for all experimented detectors, indicating that current detection models heavily rely on artifacts and struggle to identify deepfakes in the "balanced" environment.Furthermore, we directly incorporate BEAR as the training environment, only to find that detection methods still fail to generalize across varying noise levels. Such results highlight the models' inability to learn more robust features, suggesting that current detection models may struggle to adapt as deepfake technology evolves, emphasizing the need for more robust detection methods.