In this paper, we propose a novel method that combines monocular visual simultaneous localization and mapping (vSLAM) and deep-learning-based semantic segmentation. For stable operation, vSLAM requires feature points on static objects. In conventional vSLAM, random sample consensus (RANSAC) [5] is used to select those feature points. However, if a major portion of the view is occupied by moving objects, many feature points become inappropriate and RANSAC does not perform well. Based on our empirical studies, feature points in the sky and on cars often cause errors in vSLAM. We propose a new framework to exclude feature points using a mask produced by semantic segmentation. Excluding feature points in masked areas enables vSLAM to stably estimate camera motion. We apply ORB-SLAM [15] in our framework, which is a state-of-the-art implementation of monocular vSLAM. For our experiments, we created vSLAM evaluation datasets by using the CARLA simulator [3] under various conditions. Compared to state-of-the-art methods, our method can achieve significantly higher accuracy.