摘要
While non‐alcoholic fatty liver disease (NAFLD) is the most common liver disease globally, few patients develop clinically significant liver‐related events.1 Correct estimation of patients at risk for such outcomes is of high importance. It is now well established that fibrosis stage is the key determinant of liver‐related outcomes in NAFLD, and also predicts overall mortality.2,3 However, much research focus is made to step away from use of liver biopsy as a diagnostic and prognostic tool since biopsies are expensive, has the potential for harm, and results can be difficult to interpret due to sampling error and poor intra‐ and inter‐reader reproducibility, especially regarding the presence of ballooning.4 Therefore, developing non‐invasive alternatives to biopsy is a subject of high interest to patients and clinicians, as well as healthcare payers. Since most patients with NAFLD are found in primary care, a first step is to estimate the probability for presence (or absence) of clinically significant hepatic fibrosis. This is usually agreed as being stage 2–4 on the NASH CRN scale.5 Given the magnitude of the population at risk (possibly up to 25%), tools to use in a primary care setting needs to be simple, inexpensive and easy to interpret for primary care physicians. Developing such tools have been shown to be difficult. The commonly used FIB‐4 score for instance,6 was found to lack specificity and sensitivity when used in a primary care population.7 Efforts to improve first‐line diagnostics in this field are therefore highly warranted, especially since both FIB‐4 and other scores are mostly developed to identify stage 3–4 fibrosis, while inclusion in clinical trials normally requires presence of stage 2–3 fibrosis and patients with stage 2 fibrosis also have an increased risk for outcomes compared to those with minimal fibrosis.2 In this issue of Hepatology, Sripongpun and colleagues present a new diagnostic model for detection of stage 2–4 fibrosis in patients with NAFLD. The intention was to improve primary care decision‐making on when to retain the patient in primary care, and when to refer to other services in the healthcare system. The Steatosis‐Associated Fibrosis Estimator (SAFE) score was constructed using data from 676 patients recruited from the NASH CRN observational study, of which 45% had fibrosis stage 2–4.8 After considering a number of machine learning methods, the model was constructed using logistic regression. Data on age, body mass index, diabetes as a binary variable, AST, ALT, total globulin and platelets were included. The area under the ROC curve for this model to correctly define presence of significant fibrosis was 0.79 in the training data set, and 0.80 in an external validation data set, consisting of participants in the FLINT trial. Further validation was performed in a separate cohort of patients with MRE‐defined fibrosis. Finally, the authors investigated the prognostic performance of the SAFE score in the NHANES data set and found that higher levels of the score were associated with a higher overall mortality. Unfortunately, only overall mortality could be ascertained and not liver‐specific mortality or non‐fatal events. The authors compared the SAFE score to that of established diagnostic models (FIB‐4 and NAFLD fibrosis score9) where SAFE outperformed these in both the derivation and validation cohorts. Interestingly, many of the included parameters are strikingly similar to those included in the NAFLD fibrosis score. So, should we switch to recommending the SAFE score as a first‐line tool in primary care instead of FIB‐4 or other scores? Perhaps not so fast. The model was developed in a context where it is not intended to be used.10 Ideally, a diagnostic or prediction model should be developed in the cohorts where it would be applied, in this case in primary care. In secondary or tertiary care, other non‐invasive alternatives to biopsy such as vibration‐controlled transient elastography with higher diagnostic accuracy are often available as alternatives to blood‐based models. The problem can be visualized by the use of SAFE in the NHANES data set. In those with presumed NAFLD, only 58% were defined as at low risk using the suggested cut‐offs. 34% were defined as at indeterminate risk and 8% as high risk. By contrast, an analysis of the FIB‐4 score in a Swedish general population cohort found that 1.4% were classified as high risk.7 Thus, using the SAFE score in primary care could as of now lead to a high proportion of false‐positives and risk overwhelming hepatology services. Another drawback of the SAFE score is the use of more parameters compared with for instance FIB‐4, meaning that it could be more difficult to implement on a wider stage. The authors conclude that it is unknown how their score is best used in a primary care setting, to which one must fully agree. Recalibration of the SAFE score using a primary care cohort would be a natural next step, and such studies could be interesting to the community. Extensive validation of new clinical diagnostic or prediction models needs to be performed before use in clinical practice, and the community need to decide on which model is most proper to use in different settings.