Automated segmentation of the esophagus is critical in image-guided/adaptive radiotherapy of lung cancer to minimize radiation-induced toxicities such as acute esophagitis. We have developed a semantic physics-based data augmentation method for segmenting the esophagus in both planning CT (pCT) and cone beam CT (CBCT) using 3D convolutional neural networks. One hundred and ninety-one cases with their pCTs and CBCTs from four independent datasets were used to train a modified 3D U-Net architecture and a multi-objective loss function specifically designed for soft-tissue organs such as the esophagus. Scatter artifacts and noises were extracted from week-1 CBCTs using a power-law adaptive histogram equalization method and induced to the corresponding pCT were reconstructed using CBCT reconstruction parameters. Moreover, we leveraged physics-based artifact induction in pCTs to drive the esophagus segmentation in real weekly CBCTs. Segmentations were evaluated using the geometric Dice coefficient and Hausdorff distance as well as dosimetrically using mean esophagus dose and D 5cc. Due to the physics-based data augmentation, our model trained just on the synthetic CBCTs was robust and generalizable enough to also produce state-of-the-art results on the pCTs and CBCTs, achieving Dice overlaps of 0.81 and 0.74, respectively. It is concluded that our physics-based data augmentation spans the realistic noise/artifact spectrum across patient CBCT/pCT data and can generalize well across modalities, eventually improving the accuracy of treatment setup and response analysis.