Activity recognition and indoor positioning (ARIL) tasks have benefited society in various areas, such as surveillance, healthcare, and entertainment. The emerging development of ARIL employs the usage of Wi-Fi Channel State Information (CSI) as input instead of Received Signal Strength Indicator (RSSI), which is often missing and disturbed. ResNet, as one of the Deep Learning models, can perform the joint task of ARIL with high accuracy. However, due to the rapid development in Deep Learning, other newer models have the potential to improve the quality of ARIL rather than ResNet, which has a large number of training parameters. We propose applying a DenseNet model as a new feature extractor and Deep Learning architecture for the joint task of ARIL with CSI data. The architecture of DenseNet can improve the quality of ARIL thanks to the dense block, which can extract more relevant features from CSI data efficiently. We demonstrate that our proposed DenseNet model for joint ARIL improved the overall accuracy and the efficiency of the Deep Learning model using a real-world CSI dataset. Using a real-world CSI dataset, our proposed model outperforms the baseline by 4.16% on activity recognition and 1.04% on indoor localization. With hyperparameter tuning, we further reduce the trainable parameters by 64.29%, also 27.88% less than the baseline, with the cost of slightly decreasing the performance on activity recognition but increasing the performance on indoor localization.