Image-based survival prediction through deep learning techniques represents a burgeoning frontier aimed at augmenting the diagnostic capabilities of pathologists. However, directly applying existing deep learning models to survival prediction may not be a panacea due to the inherent complexity and sophistication of whole slide images (WSIs). The intricate nature of high-resolution WSIs, characterized by sophisticated patterns and inherent noise, presents significant challenges in terms of effectiveness and trustworthiness. In this paper, we propose CTUSurv, a novel survival prediction model designed to simultaneously capture cell-to-cell and cell-to-microenvironment interactions, complemented by a region-based uncertainty estimation framework to assess the reliability of survival predictions. Our approach incorporates an innovative region sampling strategy to extract task-relevant, informative regions from high-resolution WSIs. To address the challenges posed by sophisticated biological patterns, a cell-aware encoding module is integrated to model the interactions among biological entities. Furthermore, CTUSurv includes a novel aleatoric uncertainty estimation module to provide fine-grained uncertainty scores at the region level. Extensive evaluations across four datasets demonstrate the superiority of our proposed approach in terms of both predictive accuracy and reliability.