Single-photon detection has significant potential in the field of imaging due to its high sensitivity and has been widely applied across various domains. However, achieving high spatial and depth resolution through scattering media remains challenging because of the limitations of low light intensity, high background noise, and inherent time jitter of the detector. This paper proposes a physics-driven, learning-based photon-detection ghost imaging method to address these challenges. By co-designing the computational ghost imaging system and the network, we integrate imaging and reconstruction more closely to surpass the physical resolution limitations. Fringe patterns are employed to encode the depth information of the object into different channels of an image cube. A specialized depth fusion network with attention mechanisms is then designed to extract inter-depth correlation features, enabling super-resolution reconstruction at 256 × 256 pixels. Experimental results demonstrate that the proposed method presents superior imaging performance across various scenarios, offering a more compact and cost-effective alternative for photon-detection imaging.