In the domain of object detection, deep Spiking Neural Networks (SNNs) present a potential alternative to its CNNs counterpart due to its energy-efficient property. However, performance of state-of-the-art deep SNNs-based object detection approaches suffer from information loss during the event-to-tensor projection and redundant computations introduced by the artificial post-processing modules. To improve the information richness during data transformation and reduce computation cost, we propose a novel spike-driven object detection framework named Spiking-DETR on spike-form data streams by introducing Spiking-Transformer and DETR architecture. It consists of four modules, namely Mixed Time Bin Cube (MTB Cube) module for event-tensor transformation, Spatio-temporal Feature Extraction module for spatio-temporal features extraction, Spiking-Transformer-based Encoder-Decoder module for object pattern and location learning and Post-Processing module for loss computation. We evaluate the proposed model on two public event-based object detection datasets: GEN1 and 1Mpx. Comprehensive experiments show that Spiking-DETR achieves state-of-the-art performance while maintaining relatively low number of parameters and low computation cost. Additionally, it is the first model fully composed of SNNs and achieving end-to-end object detection. The code for this work can be found on the project page here: https://github.com/JosephBH0622/Spiking-DETR.