Our auditory experience is constantly disturbed by background noise and indoor reverberation in the actual speech environment, seriously damaging speech intelligibility and quality. In the past studies, people have proposed a two-stage deep neural network based on frequency domain to eliminate the above interference, and they suffer from some limitations, resulting in the upper limit of its performance. This paper proposes an end-to-end two-stage deep neural network in the time domain, eliminating noise in the first stage and reverberation in the second stage. First of all, we train the two-stage network separately and separate training parameters as the initial values for the two-stage network joint training. Compared with single-stage network and two-stage frequency domain network, the proposed two-stage time domain network presents better performance.