As artificial intelligence (AI) tools become widely accessible, more patients and medical professionals will turn to them for medical information. Large language models (LLMs), a subset of AI, excel in natural language processing tasks and hold considerable promise for clinical use. Fields such as oncology, in which clinical decisions are highly dependent on a continuous influx of new clinical trial data and evolving guidelines, stand to gain immensely from such advancements. It is therefore of critical importance to benchmark these models and describe their performance characteristics to guide their safe application to clinical oncology. Accordingly, the primary objectives of this work were to conduct comprehensive evaluations of LLMs in the field of oncology and to identify and characterize strategies that medical professionals can use to bolster their confidence in a model's response.