Localization of multiple organs in PET/CT image is a key step of computer-aided analysis of nuclear medicine images. Human torso organs highly correlate with each other in location and shape. Therefore, utilizing inter-organ geometrical correlation may help improving the organ localization accuracy. In this paper, we construct a Transformer network with one-to-one query architecture for organ bounding box localization in Positron Emission Tomography/Computed Tomography (PET/CT) images. Our method takes advantage of the self-attention mechanism of transformer network to model the inter-organ correlations of positions and sizes. Compared to the state-of-the-arts detection transformer (DETR) network, our one-to-one query architecture has simpler network structure and faster learning convergence. To address the large demand for three-dimensional 3D training images, we propose an effective multi-view localization method based on a 2D pre-trained Transformer network and then back project the multi-view 2D bounding boxes into 3D. Moreover, we propose a dual-modality fusion method to combine the complementary information from the PET and CT images. Experimental results based on 20 testing images demonstrated that our transformer network is more robust than the convolutional neural network (CNN) methods. Our one-to-one query mechanism significantly accelerated the model training speed compared to the DETR model. The fusion of dual modality information also leads to more robust organ localization results than using either single modality alone.