Micro-expressions are fast involuntary movements of the face that convey the emotions of people. They are hard to simulate or hide, so their recognition (when spotted) can be used as an indicative of true emotions. We propose in this paper a method for recognizing micro-expressions from high-speed video sequences. We use a deep neural network which we trained using a multi-stage approach. Our model is composed of a convolutional neural network which extracts representative features from individual frames of the sequence and a recurrent neural network which captures the evolution of the face during the video sequence. We use convolutional autoencoders for learning the most expressive facial features. We present our results on recognizing the emotion conveyed by the micro-expressions and the impact our multi-staged approach has on the performance of the network.