Chest X-ray imaging plays a vital role in diagnosing and treating thoracic diseases in clinical and research settings. However, the process of examining and creating reports for chest X-ray images can pose challenges due to the scarcity of experienced radiologists and time-consuming report generation, leading to reduced clinical efficacy. To address this issue and make progress in clinical automation, there has been significant research focused on developing automated systems for radiology report generation. However, existing systems have limitations, such as considering clinical workflow, neglecting clinical context, and providing explainability. This paper presents the development of a fully transformer-based automatic chest X-ray report generation network named TransXpainNet that focuses on clinical accuracy and enhances other text generation metrics. The model utilizes a domain-knowledge-based vision transformer, DeiT-CXR, to extract image features. Supportive documents like clinical history are incorporated to enrich the report generation process. The model is trained and tested on two X-ray report generation datasets, IU X-ray and MIMIC-CXR, achieving promising results on word overlap, clinical accuracy, and semantic similarity-based metrics. Qualitative results with Grad-CAM demonstrate disease location for radiologists' understanding. Our proposed model embraces the radiologists' workflow to improve explainability, transparency, and trustworthiness for radiologists.