In 2017, Vaswani et al. proposed a new neural network architecture named Transformer.That modern architecture quickly revolutionized the natural language processing world.Models like GPT and BERT relying on this Transformer architecture have fully outperformed the previous state-of-theart networks.It surpassed the earlier approaches by such a wide margin that all the recent cutting edge models seem to rely on these Transformer-based architectures.In this paper, we provide an overview and explanations of the latest models.We cover the auto-regressive models such as GPT, GPT-2 and XLNET, as well as the auto-encoder architecture such as BERT and a lot of post-BERT models like RoBERTa, ALBERT, ERNIE 1.0/2.0.