In recent years, attention has been increasingly devoted to the development of rescue robots that can protect humans from the inherent risks of rescue work. Particularly, anticipated is the development of a robot that can move deeply through small spaces. We have devoted our attention to peristalsis, the movement mechanism used by earthworms. A reinforcement learning technique used for the derivation of the robot movement pattern, Q-learning, was used to develop a three-segmented peristaltic crawling robot with a motor drive. Characteristically, peristalsis can provide movement capability if at least three segments work, even if a segmented part does not function. Therefore, we had intended to derive the movement pattern of many-segmented peristaltic crawling robots using Q-learning. However, because of the necessary increase in calculations, in the case of many segments, Q-learning cannot be used because of insufficient memory. Therefore, we devoted our attention to a learning method called Actor–Critic, which can be implemented with low memory. Because Actor-Critic methods are TD methods that have a separate memory structure to explicitly represent the policy independent of the value function. Using it, we examined the movement patterns of six-segmented peristaltic crawling robots.