The convolutional neural network (CNN) is a potent and popular neural network types and has been crucial to deep learning in recent years. A standard CNN which is known as 2-dimensions CNN was first proposed to solve image classification problems at first. With advancements in science and technology, as well as the growth of the internet, tasks involving video analysis have received increased attention in recent years. As a result, it was suggested that a 3-dimensions convolutional neural network (3D CNN) be applied to tackle 3D image processing or video analysis. 3D CNN is primarily used in the field of human activity recognition, video classification, medical imaging detection, and so on. This paper offers a succinct introduction to the general architecture and representative models of 3D CNN, contrasts the variations between 2D CNN and 3D CNN, explores the widely used models derived from 3D CNN, and displays the application outcomes of 3D CNN.