Objectives Artificial intelligence (AI) has been an important addition to medicine. We aimed to explore the use of deep learning (DL) to distinguish benign from malignant lesions with breast ultrasound (BUS). Methods The DL model was trained with BUS nodule data using a standard protocol (1271 malignant nodules, 1053 benign nodules, and 2144 images of the contralateral normal breast). The model was tested with 692 images of 256 breast nodules. We used the accuracy, precision, recall, harmonic mean of recall and precision, and mean average precision as the indices to assess the DL model. We used 100 BUS images to evaluate differences in diagnostic accuracy among the AI system, experts (>25 years of experience), and physicians with varying levels of experience. A receiver operating characteristic curve was generated to evaluate the accuracy for distinguishing between benign and malignant breast nodules. Results The DL model showed 73.3% sensitivity and 94.9% specificity for the diagnosis of benign versus malignant breast nodules (area under the curve, 0.943). No significant difference in diagnostic ability was found between the AI system and the expert group ( P = .951), although the physicians with lower levels of experience showed significant differences from the AI and expert groups ( P = .01 and .03, respectively). Conclusions Deep learning could distinguish between benign and malignant breast nodules with BUS. On BUS images, DL achieved diagnostic accuracy equivalent to that of expert physicians.