To address the advancements in jamming technology, it is imperative to consider robust adaptive beamforming (RBF) methods with finite snapshots and gain/phase (G/P) errors. This paper introduces an end-to-end RBF approach that utilizes a two-stage convolutional neural network. The first stage includes convolutional blocks and residual blocks without downsampling; the blocks assess the covariance matrix precisely using finite snapshots. The second stage maps the first stage’s output to an adaptive weight vector employing a similar structure to the first stage. The two stages are pre-trained with different datasets and fine-tuned as end-to-end networks, simplifying the network training process. The two-stage structure enables the network to possess practical physical meaning, allowing for satisfying performance even with a few snapshots in the presence of array G/P errors. We demonstrate the resulting beamformer’s performance with numerical examples and compare it to various other adaptive beamformers.