Gas concentration prediction is one of the main tasks in the field of electronic nose (E-nose). Currently, most of the models based on recurrent and convolutional architectures, such as long short-term memory (LSTM) and temporal convolutional network (TCN), focus only on the time domain (TD) information, which may lead to the difficulty in capturing the features and the omission of information extraction of long-term sequences in E-nose data. Therefore, a TCN model combining time-frequency (TF) enhanced network called TF-TCN is proposed in this work. Specifically, a frequency domain (FD) module, which transfers the TD information to the FD by the fast Fourier transform (FFT), is added into the traditional TCN to perform the feature extraction with the TCN basic blocks in multiple scales. Meanwhile, Gaussian error linear unit (GELU) replaces the rectified linear unit (RELU) to utilize the nonlinearity, which weights inputs by their values rather than gates inputs by their signs as in RELU. Based on two single gas dataset, sufficient experiments demonstrate the advantages of TF-TCN from different perspectives. Compared with the comparative models, TF-TCN reduces the root mean square error (RMSE) and the mean absolute error (MAE) on the two single gas datasets by at least 23.8% and 36.1% respectively. In addition, experiments based on a mixed gas dataset demonstrate the outstanding prediction abilities of TF-TCN even under disturbed conditions. As a result, our work may provide a novel way of thinking about the extraction of information from E-nose data.