Salivary gland neoplasms (SGNs) represent a group of human neoplasms characterized by a remarkable cyto-morphological diversity, which frequently poses diagnostic challenges. Accurate histological categorization of salivary tumors is crucial to make precise diagnoses and guide decisions regarding patient management. Within the scope of this study, a computer-aided diagnosis model utilizing Vision Transformer, a cutting-edge deep-learning model in computer vision, has been developed to accurately classify the most prevalent subtypes of SGNs. These subtypes include pleomorphic adenoma, myoepithelioma, Warthin's tumor, basal cell adenoma, oncocytic adenoma, cystadenoma, mucoepidermoid carcinoma and salivary adenoid cystic carcinoma. The dataset comprised 3046 whole slide images (WSIs) of histologically confirmed salivary gland tumors, encompassing nine distinct tissue categories. SGN-ViT exhibited impressive performance in classifying the eight salivary gland tumors, achieving an accuracy of 0.9966, an AUC value of 0.9899, precision of 0.9848, recall of 0.9848, and an F1-score of 0.9848. When compared to benchmark models, SGN-ViT surpassed them in terms of diagnostic performance. In a subset of 100 WSIs, SGN-ViT demonstrated comparable diagnostic performance to that of the chief pathologist while significantly reducing the diagnosis time, indicating that SGN-ViT held the potential to serve as a valuable computer-aided diagnostic tool for salivary tumors, enhancing the diagnostic accuracy of junior pathologists.