Knowledge distillation (KD) improves the performance of a compact student network by transferring learned knowledge from a cumbersome teacher network. In the existing approaches, the multiscale feature knowledge is transferred via densely connected paths, which increases the optimization difficulty. Moreover, correlations among the labels are neglected despite their capability to enhance the intraclass similarity of samples. To solve these issues, we propose cascade fusion and correlation enhancement for KD (CC-KD). The multiscale feature knowledge is transferred via much simpler paths, which are constructed by fusing features of different scales with cross-scale attention (CSA) in a cascade manner, thereby reducing the optimization difficulty. On the other hand, the relational knowledge of teacher logits is further enhanced by correlations of the corresponding labels, so that the student can produce more similar logits for the samples in the same category. Extensive experimental results on five public datasets (i.e., CIFAR100/10, ImageNet, RAF-DB, and FERPlus) indicate superior performance of the proposed method over several state-of-the-arts (SOTAs). More specifically, our method obtains an accuracy of 71.70% on ImageNet and achieves a new record of 90.20% on RAF-DB with fewer calculations and parameters.