Knowledge distillation (KD) can compress deep neural networks (DNNs) by transferring the knowledge of the redundant teacher model to the resource-friendly student model, where cross-layer KD (CKD) conducts KD between each stage of students and the multiple stages of teachers. However, previous CKD schemes select the coarse-grained stagewise features of teachers to teach students, leading to improper channel alignment. Also, most of these methods conduct uniform distillation for all the knowledge, limiting students to focus more on important knowledge. To address these problems, we propose a dense KD (DenseKD) in this article, dubbed as DenseKD. First, to achieve more accurate feature alignment in CKD, we construct the learnable dense architecture to make each channel of student flexibly capture more diverse channelwise features from teacher. Moreover, we introduce region importance to investigate the region's guiding potential, it distinguishes the influence of different regions by the variation of representations of teacher models. In addition, to make students pay more attention to useful samples in KD, we calculate sample importance by the loss of teacher models. Consistent improvements over state-of-the-art approaches are observed in experiments on multiple vision tasks. For example, in the classification task, DenseKD achieves 72.30% accuracy of ResNet-20 on CIFAR-100, which is higher than the results of previous CKD methods. In addition, in the object detection task, DenseKD gains 2.84% mean average precision (mAP) improvements of Faster R-CNN with ResNet-18 against vanilla KD.