Interaction between the background vehicles (BVs) and automated vehicles (AVs) in scenario-based testing plays a critical role in evaluating the intelligence of the AVs. Current testing scenarios typically employ predefined or scripted BVs, which inadequately reflect the complexity of human-like social behaviors in real-world driving scenarios, and also lack a systematic metric for evaluating the comprehensive intelligence of AVs. Therefore, this paper proposes an evolving scenario generation method, employing deep reinforcement learning (DRL) to construct human-like BVs that interact with AVs, and this evolving scenario is designed to test and evaluate the intelligence of AVs. Firstly, a class of BV driver models with human-like competitive, mutual, and cooperative driving motivations is designed. Then, utilizing the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm and an improved level-k training procedure, the three distinct driver models acquire game-based interactive driving policies. And these driver models are combined to generate evolving scenarios in which they can interact continuously and evolve diverse contents. Next, a framework including safety, driving efficiency, and interaction utility are presented to evaluate and quantify the intelligence performance of 3 systems under test (SUTs), indicating the effectiveness of the evolving scenario for intelligence testing. Finally, the complexity and fidelity of the proposed evolving testing scenario are validated. The results demonstrate that the proposed evolving scenario exhibits the highest level of complexity compared to other baseline scenarios and has more than 85% similarity to naturalistic driving data. This highlights the potential of the proposed method to facilitate the development and evaluation of high-level AVs in a realistic and challenging environment.