Abstract Large Language Models (LLMs) are transformer‐based neural networks with billions of parameters trained on very large text corpora from diverse sources. LLMs have the potential to improve healthcare due to their capability to parse complex concepts and generate context‐based responses. The interest in LLMs has not spared digestive disease academics, who have mainly investigated foundational LLM accuracy, which ranges from 25% to 90% and is influenced by the lack of standardized rules to report methodologies and results for LLM‐oriented research. In addition, a critical issue is the absence of a universally accepted definition of accuracy, varying from binary to scalar interpretations, often tied to grader expertise without reference to clinical guidelines. We address strategies and challenges to increase accuracy. In particular, LLMs can be infused with domain knowledge using Retrieval Augmented Generation (RAG) or Supervised Fine‐Tuning (SFT) with reinforcement learning from human feedback (RLHF). RAG faces challenges with in‐context window limits and accurate information retrieval from the provided context. SFT, a deeper adaptation method, is computationally demanding and requires specialized knowledge. LLMs may increase patient quality of care across the field of digestive diseases, where physicians are often engaged in screening, treatment and surveillance for a broad range of pathologies for which in‐context learning or SFT with RLHF could improve clinical decision‐making and patient outcomes. However, despite their potential, the safe deployment of LLMs in healthcare still needs to overcome hurdles in accuracy, suggesting a need for strategies that integrate human feedback with advanced model training.