A multi-GPU implementation of the multilevel fast multipole algorithm (MLFMA) based on the hybrid OpenMPCUDA parallel programming model (OpenMP-CUDA-MLFMA) is presented for computing electromagnetic scattering of a three-dimensional conducting object. The proposed hierarchical parallelization strategy ensures a high computational throughput for the GPU calculation. The resulting OpenMP-based multi-GPU implementation is capable of solving real-life problems with over one million unknowns with a remarkable speed-up. The radar cross sections of a few benchmark objects are calculated to demonstrate the accuracy of the solution. The results are compared with those from the CPU-based MLFMA and measurements. The capability and efficiency of the presented method are analyzed through the examples of a sphere, an aerocraft, and a missile-like object. Compared with the 8-threaded CPU-based MLFMA, the OpenMP-CUDA-MLFMA method can achieve from 5 to 20 total speed-up ratios.