Abstract Long-form document matching is an important direction in the field of natural language processing and can be applied to tasks such as news recommendation and text clustering. However, long-form document matching suffers from noisiness and sparsity of semantic information in long text. Using short-form document matching methods on a long-form matching problem is not satisfactory. Long-form document matching has attracted the attention of researchers, who have proposed many effective methods. Methods for matching long texts can be divided into three categories: traditional bag-of-words-based models, traditional deep learning-based models, and pre-training-based models. This study reviews typical methods of long-form document matching, analyzes their advantages and disadvantages, and discusses possible future developments.