This paper studies the challenging problem of zero-shot Chinese text recognition, which requires the model to train on text line images containing only the seen characters, and then recognize the unseen characters from new text line images. Most of the previous methods only consider the zero-shot Chinese character recognition problem. They attempt to decompose the Chinese characters into radical representations and then recognize them at the radical level. Some methods developed recently have extended the radical-based recognition model from recognizing characters to recognizing text lines. However, the disadvantages of these methods include the requirement of long training time and a complicated decoding process. In addition, these methods are unsuitable for long text sequences. In this paper, we have proposed a novel zero-shot Chinese text recognition network (ZCTRN) by matching the class embeddings with the visual features. Specifically, our proposed model consists of three components: a text line encoder that extracts the visual features from the text line images, a class embedding module that encodes the character classes into class embeddings, and a bidirectional embedding transfer module that can map the class embeddings into the visual space and preserve the information of the original class embeddings. In addition, we use a distance-based CTC decoder to match the visual features with the class embeddings and output the recognition results. Experimental obtained by applying our proposed network to the MTHv2 dataset and the ICDAR-2013 handwriting competition dataset show that our method not only preserves high accuracy in recognizing text line images containing seen characters, but also outperforms the existing state-of-the-art models in recognizing text line images containing unseen characters.