In visual intelligent scenarios, large amounts of real-time video data are generated at the end. During the optimization process for different video tasks, frequent data copying between devices and hosts can be limited by data bandwidth, resulting in high system latency. We investigate computing bottlenecks in online video processing to reduce processing latency and improve efficiency. In this paper, we propose a joint video acceleration processing (JVAP) architecture for online edge systems. First, video-compressed streams are transmitted to the GPU for decoding and conversion of data content. Second, we design data pre-processing and post-processing modules to achieve specific functional operators and separately complete operator combinations and stitching. Different computing tasks can reuse the implemented operator library. Third, we modify the data interface of the inference task model to maintain the consistent flow of data in the GPU. We conduct experiments using videos of different qualities and model frameworks of varying scales. The results indicate that the proposed method enhances the average processing efficiency over 11% with respect to existing representative acceleration frameworks and extends the potential application of online intelligent inference algorithms.