作者
Bruno Astuto,Io Flament,Rutwik Shah,Matthew D. Bucknor,Thomas M. Link,Valentina Pedoia,Sharmila Majumdar
摘要
Purpose: Semi quantitative scoring systems, such as the Whole-Organ Magnetic Resonance Imaging Score (WORMS) or MRI Osteoarthritis Knee Score (MOAKS), have been developed in an attempt to standardize Knee MRI reading. Despite grading systems being widely used in research setting the clinical application is hampered by the time and level of expertise needed to reliably perform the reading making the automation of this task appealing for a smoother and faster clinical translation. The goal of this study is to fill this void by capitalizing on recent developments in Deep Learning (DL) applied to medical imaging. Specifically, we aim to: i) create models to identify cartilage lesions (CLs) and assess its severity, ii) identify the presence of bone marrow edema lesions (BMELs), ii) combine the two models in a multi-task automated and scalable fashion to improve assessment accuracy. Methods: 1,435 knee MRI from subjects with and without OA were collected from three previous studies (age = 42.79 ± 14.75 years, BMI = 24.28 ± 3.22Kg/m2, 48 males, 52 females). All studies used a high-resolution 3D fast spin-echo (FSE) CUBE sequence TR/TE = 1500/26.69ms, field-of-view = 14cm, matrix = 512-by-512, slice-thickness = 0.5mm, bandwidth = 50kHz).A 3D V-Net neural network (NN) architecture was used to learn segmentations of the 6 cartilage compartments using 480 manually segmented volumes as training/test data. In order to optimize the segmentation task, we utilized two V-net architectures. The first performed segmentations for 5 classes (Figure 1A), namely femur, tibia and patella cartilage, one class for meniscus and one for background (BG). The second V-net (Figure 1B), solves the problem of assigning 11 labels to the compartments segmented by the first V-net. The 11 classes are: patella, trochlea, medial and lateral tibia, medial and lateral femur cartilage, 4 menisci and BG). After applying the segmentation to the entire dataset, bounding boxes around the 6 cartilage compartments were extracted, resulting in 8,610 cartilage volumes of interest (cVOIs) (Figure 1C). cVOIs were randomly divided with a 65/20/15% split into training, validation, and holdout sets, keeping the distributions of lesion severity per compartment. 3 classes labeled were generated as follows: (1) No Lesion NL (WORMS 0 and 1), Partial Thickness Lesion - PT (WORMS 2, 3 and 4) and (3) Full Thickness Lesion - FT (WORMS 2.5, 5 and 6). Randomly generated 3-axis rotational (±25degrees) and zooming (±a factor of 20%) image augmentations were performed (Figure 1C). MOAKS grading can also be used for our study, however WORMS grades were available for all cVOI. The distribution of lesions varies for each compartment, being Patella the one where we find the most balanced dataset throughout the lesion severity classes. Nonetheless, during preprocessing augmentation and up sampling were used, together with class weights applied to the loss functions during training to mitigate the unbalancing issue. The lesion classification problem was divided in 3 steps: I) 3D CNN automatic CL severity 3-class classification (Figure 1E), II) automatic BMEL 2-class classification (same 3D CNN architecture used for CL classification, but with a 2-class output - Figure 1F) and III) The final optimal combination of the outputs of both DL networks were combined with demographics data and fed as input to a Gradient Boost classifier (Figure 1G), where a final lesion severity staging solution was output and applied to a holdout set. Results: The first step on CL classification was to automatically classify lesions severity only with 3D volumetric image data. Overall accuracy for that classifier was 82.79% on the holdout set. The 3D architecture applied to BMEL 2-class allowed an accuracy for 79.5%. For the shallow classifier ensemble three class WORMS model, an overall accuracy of 85.6% was achieved when combining the 3D-CNN with demographics data. The count confusion matrix can be viewed in Figure 2, along with results for the combinations of the 3 classifiers used in our pipeline. A 3rd combination is also considered when using the 3D CNN cartilage predictions, demographics data and our BMEL models together as input for the shallow classifier, where it boosted the performance to a 86.7% overall accuracy. It is worth noting that the consideration of demographics alone boosts the performance, specially decreasing mispredictions of NL as FT. When considering our BMLE model (Figure 2C) we are able to finetunes PT and FT predictions. In an attempt to interpreted better our results misclassified cases were further inspected by experts (Figure 3). Conclusions: By combining different anatomical structures (distinct cartilage compartments) and lesion classification grading for both cartilage and BMEL, we are moving towards multitask machine learning for lesion detection. The proposed approach is weakly supervised in the sense that it learns features using only image level labels (i.e., all that is known is the presence or absence of a lesion somewhere in the 3D volume). With the proposed approach, we were able to boost performance of our final classifiers by not simply focusing on what the fine tuning of a single purpose model could offer, but rather broadly considering related tasks that could bring additional information to our classification problem.View Large Image Figure ViewerDownload Hi-res image Download (PPT)View Large Image Figure ViewerDownload Hi-res image Download (PPT)