Bowel preparation is considered a critical step in colonoscopy. Manual bowel preparation assessment is time consuming and prone to human errors and biases. Automatic Bowel evaluation using machine/deep learning is a better and efficient alternative. Most of the relevant literature have focused on achieving high validation accuracy, where private handy-picked dataset does not reflect real-environment situation. Furthermore, treating a video dataset as a collection of individual frames may produce overestimated results. This is due to the fact a video contains nearly identical consecutive frames, hence, dividing them into training and validation sets yields two similar distributed datasets. Given a public dataset, Nerthus, we show empirically a significant drop in performance when a video dataset is treated as a collection of videos (depicting the real environment/context), instead of a collection of individual frames. We propose a model that utilizes both sequence and none-sequence (spatial) information within videos. The proposed model achieved on average 83% validation accuracy across 4 validation sets, whereas, the state-of-the-art models achieved on average a range of 66%–72% validation accuracy.