作者
Maher Albitar,Hong Zhang,Andrew Ip,Wanlong Ma,Jeffrey Justin Estella,Lori A. Leslie,Tatyana Feldman,Ahmad Charifa,Arash Mohtashamian,Andrew L. Pecora,André Goy
摘要
Introduction: Lymphoma diagnosis and classification requires pathologist interpretation of morphology and large numbers of immunohistochemistry (IHC) stains of various CD markers. This process is subjective and requires a significant amount of tissue. In contrast, RNA quantification of the same CD markers used in IHC using next generation sequencing (NGS) requires little tissue and is less influenced by the antigen retrieval process used in IHC. However, IHC staining and microscopic examination allows evaluation of the expression in various subpopulations and makes diagnosis possible. In contrast, when total RNA is evaluated by NGS, distinguishing between subpopulations is lost. Machine learning algorithms are capable of multi-marker normalizing and compensate for the loss of subpopulation analysis. To confirm this, we explored the capability of using RNA quantification of 30 CD markers by NGS from FFPE tissue along with machine learning in the clinical diagnosis and classification of various types of lymphoma. Methods: Formalin-fixed paraffin-embedded (FFPE) tissue from 130 diffuse large B-cell lymphoma (DLBCL), 70 mantle cell lymphoma, 92 T-cell lymphoma, 48 follicular lymphoma, 36 Hodgkin lymphoma, and 52 marginal zone lymphoma samples were used for extracting mRNA. The studied samples were consecutive without selection and included mainly lymph node excisional biopsies or core biopsies. RNA sequencing was performed using a targeted hybrid capture panel that included CD1A, CD2, CD3D, CD3E, CD3G, CD4, CD5, CD7, CD8A, CD8B, CD10, CD14, CD19, CD20, CD22, CD33, CD34, CD38, CD40, CD44, CD47, CD68, CD70, CD74, CD79A, CD79B, CD81, CD138, CD200, and CD274 genes. Salmon v1.4.0 software was used for expression quantification (TPM). Random forest machine learning algorithm was used for predicting diagnosis. Randomly selected two thirds of samples were used for training and one third was used for testing. Results: In some cases, diagnosis can be made by simply inspecting the RNA levels of various CD markers. However, machine learning shows remarkably high sensitivity and specificity in the diagnosis of most lymphoma subclasses. Area under the curve (AUC) was at 1.00 (95% CI: 1.000-1.00) for DLBCL vs. T-cell lymphoma, Hodgkin vs. T-cell, Hodgkin vs. DLBCL, mantle vs. DLBCL, and Follicular lymphoma vs. marginal lymphoma with 100% sensitivity and specificity in the testing set. AUC was at 0.974 (95% CI: 0.920-1.000) for marginal lymphoma vs. mantle cell lymphoma with sensitivity of 88% and specificity of 100%. The AUC was at 0.887 (95% CI: 0.776-0.999) for follicular lymphoma vs. DLBCL with sensitivity of 81.3% and specificity of 83.7%. Conclusions: This data demonstrates that NGS quantification of RNA from 30 CD markers when combined with machine learning is adequate for reliable classification of various types of lymphoma. This approach can provide valuable information to distinguish between difficult diagnoses, and if trained adequately has the potential to expand to more borderline cases. More importantly, this technology can be automated and less susceptible to human errors. RNA quantification using NGS has the potential to replace the need for IHC and can be applied when samples are limited such as in needle aspiration or core biopsies.