External validation of biomarkers of fatty liver in the general population: the Bagnacavallo study

Objective: We externally validated the fatty liver index (FLI), the lipid accumulation product (LAP), the hepatic steatosis index (HSI), and the Zhejiang University index (ZJU) for the diagnosis of fatty liver (FL) and non-alcoholic fatty liver disease (NAFLD) in the general population. Subjects and Methods: The validation was performed on 2159 citizens of the town of Bagnacavallo (Ravenna, Italy). Calibration was evaluated by calculating the calibration slope and intercept and by inspecting calibration plots; discrimination was evaluated using the c-statistic. Results: The average calibration slope was 1 and the average intercept was 0 for all combinations of outcomes and Qeios, CC-BY 4.0 · Article, November 24, 2020 Qeios ID: CD6LGJ · https://doi.org/10.32388/CD6LGJ 1/16 biomarkers. As for FL, the c-statistic was 0.85 for FLI, 0.83 for ZJU, 0.82 for HSI, and 0.80 for LAP. As for NAFLD, the cstatistic was 0.77 for FLI, 0.76 for ZJU, 0.75 for HSI, and 0.74 for LAP. All the biomarkers were strongly correlated with each other. Conclusion: FLI, LAP, HSI, and ZJU can be used to diagnose FL in the Bagnacavallo population, even if FLI has the highest discriminative ability. The same biomarkers perform similarly for the diagnosis of NAFLD even if FLI has a small advantage as discrimination is concerned. Definitions Fatty Liver Index Defined by National Cancer Institute Fatty liver (liver steatosis) Defined by EASL–EASD–EASO Non-alcoholic fatty liver disease (NAFLD) Defined by EASL–EASD–EASO Francesco Giuseppe Foschi and Giorgio Bedogni contributed equally to the present work. The Bagnacavallo Study Group includes Pietro Andreone, Anna Chiara Dall’Aglio, Mauro Bernardi, Lauro Bucchi, Francesca Dazzani, Fabio Falcini, Arianna Lanzi, Alessandra Ravaioli, Margherita Rimini, Giulia Rovesti, Gaia Saini, Giuseppe Francesco Stefanini. Corresponding Author: Dr. Giorgio Bedogni, Clinical Epidemiology Unit, Liver Research Center, Building Q, AREA Science Park, Strada Statale 14 km 163.5, 34012 Basovizza, Trieste, Italy; Email: giorgiobedogni@gmail.com


Introduction
Fatty liver (liver steatosis), the most common liver disease worldwide, has been classified into non-alcoholic fatty liver disease (NAFLD) and alcoholic fatty liver disease (AFLD) for almost 40 years [1] . Such dichotomization has been increasingly criticized so that an international panel of experts has recently proposed to abandon the NAFLD definition, adopting instead the more comprehensive definition of metabolic dysfunction-associated fatty liver disease (MAFLD), which has the advantage of being independent of alcohol intake [2][3] [4] .
Independently of its etiology, FL is operationally defined as visible steatosis in more than 5% of hepatocytes at liver biopsy or as an intrahepatic triglyceride content of at least 5.6% at magnetic resonance spectroscopy or magnetic resonance imaging [5] . Liver biopsy can be performed only in selected patients followed at tertiary care centers and the use of magnetic resonance techniques is restricted to few research centers because of its cost [5] . The method most commonly used to diagnose FL in both clinical practice and epidemiological research is liver ultrasonography (LUS) [5] . Another option, suggested by current guidelines to diagnose FL when LUS is not available, is the use of surrogate biomarkers of FL [5] .
NAFLD was defined as FL associated with ethanol intake ≤ 2 alcohol units (20 g) / day in women and ≤ 3 alcohol units (30 g) / day in men testing negative for hepatitis B surface antigen and anti-HCV antibodies and not under treatment with steatogenic drugs [5] . Alcoholic fatty liver disease (AFLD) was defined as FL associated with ethanol intake ≥ 2 (20 g) alcohol units/day in women and ≥ 3 alcohol units (30 g) /day in men testing negative for hepatitis B surface antigen and anti-HCV antibodies and not under treatment with steatogenic drugs [5] . For the present analysis, NAFLD was coded as any degree of FL (0 = normal liver or AFLD; 1 = NAFLD).
FLI is suggested by the European Association for the Study of the Liver (EASL) as biomarker of liver steatosis [5] . Other biomarkers suggested by EASL are SteatoTest [22] , which is based on a proprietary formula and could not be validated here, and the NAFLD-liver fat score [23] , which was developed using magnetic resonance spectroscopy as the reference method and was therefore not considered here. We were also unable to calculate NAFLD-LFS because insulin, which is a required predictor of NAFLD-LFS, was available only in 1415 (66%) of our 2159 subjects. For the same reason and because of the unavailability of hip circumference, we could not to calculate the ION index, which requires both insulin and the waist-to-hip ratio. We could have imputed the missing values of insulin [12] , but we did not do that because insulin is known to be a key predictor of FL [18] and missingness of key predictors should be avoided when developing or validating prediction models [7] .
FLI and LAP were developed to predict FL while HSI and ZJU were developed to predict NAFLD. All biomarkers were developed, using LUS as the reference method, in cross-sections of individuals from the general population (FLI, LAP) or health-care facilities (HSI, ZJU) by matching individuals with FL or NAFLD to individuals without it. The formulae for calculating the biomarkers are given in Appendix 1.

Sample size
We did not perform any formal sample size calculation but were quite confident that with 896/2159 (42%) cases of FL and 567/2159 (26%) cases of NAFLD we could attain a precise assessment of the performance of the biomarkers [11] . At least 200 events and non-events are in fact required for reasonable external validation of model performance [6] [7] .

Statistical analysis
Most continuous variables were not Gaussian-distributed, and all are reported as median (50 th percentile) and interquartile range (25 th and 75 th percentiles). Discrete variables are reported as the number and proportion of subjects with the characteristic of interest. Calibration was evaluated by applying Van Calster's three-level hierarchy [8] [24] . Level 1 of this hierarchy is "mean calibration" or "calibration-in-the-large", which compares the observed event rate with the average predicted risk. Level 2 is "weak calibration", which consists of a logistic calibration analysis testing whether the calibration slope is 1 and the calibration intercept is 0 and is aimed at revealing systematic overestimation or underestimation of risk.
Level 3 is "moderate calibration", which evaluates whether the predicted risks correspond to the observed event rates using a calibration plot. Such a graph plots the predicted (expected) outcome probabilities (x-axis) against the observed outcome frequencies (y-axis). As suggested by TRIPOD [6] , we performed the calibration using tenths of the predicted risk and superimposed a line obtained by locally weighted scatterplot smoothing [6] . A well-calibrated model shows predictions lying or around the 45° line of the calibration plot. Discrimination was evaluated using Harrell's c-statistic [25] . Statistical analysis was performed using Stata 16.1 (Stata Corporation, College Station, TX, USA) with the pmcalplot module [26] , and R 4.0.3 (R Core Team 2020, Vienna, Austria) with the val.prob.ci.2 function [8] . R code was run from within Stata using the rcall package [27] .

Study population
The measurements of the 2159 citizens who took part in the study are given in Table 1 and are described in greater detail elsewhere [11] [12] . FL was diagnosed in 896 (42%, 95%CI 39 to 44%) and NAFLD in 567 (26, %24 to 28%) of them.  For instance, the linear predictor of FLI explained 72% of the variance of HSI, 81% of the variance of ZJU, and 51% of the variance of log e -transformed LAP. Moreover, ZJU explained 89% of the variance of HSI. The similar performance of these biomarkers at diagnosing FL and NAFLD ( Table 1) is thus likely to be partially explained by their underlying mutual association.

Discussion
In the present study, we took advantage of the Bagnacavallo cross-sectional study of liver disease [11] to externally validate FLI [18] , LAP [15] , HSI [19] , and ZJU [20] for the diagnosis of FL and NAFLD in the general population. All biomarkers showed an acceptable mean, weak, and moderate calibration for the diagnosis of both FL (Figure 1 and Table 2) and NAFLD ( Figure 2 and Table 2) [8] (8).
We hypothesized that FLI would perform better than LAP, HSI, and ZJU at diagnosing FL and possibly NAFLD in the present population. (We had some reservations about NAFLD because FLI was purposely developed to predict FL.) Our Qeios, CC-BY 4.0 · Article, November 24, 2020

Conclusion
In conclusion, we found that FLI, LAP, ZJU, and HSI can be satisfactorily used to diagnose FL and NAFLD in the Bagnacavallo population, even if FLI has the highest discriminative ability. These biomarkers are strongly associated and this is likely to partially explain their similar performance. Further studies are needed to evaluate the use of these biomarkers for the diagnosis of MAFLD [29] , the diagnostic entity which is going to replace NAFLD [2][3] [4] . ZJU = bmi + gmmol + tgmmol + 3*altast +2*female