Advancements in natural language processing (NLP) have led to the emergence of large language models (LLMs) as potential tools for patient consultations. This study investigates the ability of reasoning-capable models to provide diagnostic and treatment recommendations for orofacial clefts. A cross-sectional comparative study was conducted using 20 questions based on Google Trends and expert experience, with both models providing responses to these queries. Readability was assessed using the Flesch-Kincaid Reading Ease (FRES), Flesch-Kincaid Grade Level (FKGL), sentence count, number of sentences, and percentage of complex words. No statistically significant differences were found in the readability metrics for FKGL (P = 0.064) and FRES (P = 0.56) between the responses of the 2 models. Physician evaluation using a 4-point Likert scale assessed accuracy, clarity, relevance, and trustworthiness, with Deepseek-R1 achieving significantly higher ratings overall (P = 0.041). However, GPT o1-preview exhibited notable empathy in certain clinical scenarios. Both models displayed complementary strengths, indicating potential for clinical consultation applications. Future research should focus on integrating these strengths within medical-specific LLMs to generate more reliable, empathetic, and personalized treatment recommendations.