Multi-phase contrast-enhanced CT images can provide abundant and complementary tumor information, and thus radiologists often use multi-phase images to assist in segmenting and diagnosing liver tumors. However, the current multi-stage liver tumor segmentation methods are based on convolutional neural networks (CNNs), which make them ineffective in extracting global information during the multi-phase information fusion process. In this study, we propose a novel multi-phase liver tumor segmentation approach using delayed phase images to aid in portal vein phase tumor segmentation. The proposed method employs a Transformer structure to extract both global information and local information of tumors, which contributes to the precise segmentation of tumor boundaries. More importantly, we design a cross-phase aggregator (CFA), which facilitates the bidirectional interaction of cross-phase features to take full advantage of the complementary information from multi-phase images. A dataset of 164 multi-phase abdominal CT scans was collected with Institutional Review Board approval to evaluate the performance of the proposed approach. The experimental results showed that the proposed approach can better utilize multi-phase information and is superior to several state-of-the-art methods. Ablation study is performed to further validate the effectiveness of each module in the proposed model. The proposed method has the potential to assist radiologists to locate more accurate liver tumors and improve their diagnosis efficiency.