A multi-cue dynamic features hybrid fusion (MDF-HF) method for video-based facial expression recognition is presented. It is composed of key-frame selection, multi-cue dynamic feature extraction, and information fusion components. An adaptive key-frame selection strategy is first designed in the training procedure to extract pivotal facial images from video sequences, addressing the challenge of imbalanced data distribution and improving data quality. The similarity threshold used for key-frame selection is automatically adjusted based on the number of image frames in each expression category, creating a flexible frame processing procedure. Multi-cue spatio-temporal feature descriptors are then designed to acquire diverse dynamic feature representations from the selected key-frame sequences. With parallel computation, different levels of semantic information are extracted simultaneously to explore facial expression deformation in video clips. To integrate features from multiple cues, a weighted stacking ensemble strategy is devised, preserving unique feature characteristics while exploring interrelationships among the multi-cue features. The proposed method is evaluated on three benchmark datasets: eNTERFACE'05, BAUM-1s, and AFEW, achieving average accuracies of 59.7%, 57.5%, and 54.7%, respectively. The MDF-HF method exhibits superior performance, compared to state-of-the-art methods in facial expression recognition, offering a robust solution for recognizing facial expressions in dynamic and unconstrained video scenarios.