Net

 Cardiovascular disease risk increases when lipoprotein metabolism is dysfunctional. We have developed a computational model able to derive indicators of lipoprotein production, lipolysis, and uptake processes from a single lipoprotein profile measurement. This is the first study to investigate whether lipoprotein metabolism indicators can improve cardiovascular risk prediction and therapy management.  We calculated lipoprotein metabolism indicators for 1981 subjects (145 cases, 1836 controls) from the Framingham Heart Study offspring cohort in which NMR lipoprotein profiles were measured. We applied a statistical learning algorithm using a support vector machine to select conventional risk factors and lipoprotein metabolism indicators that contributed to predicting risk for general cardiovascular disease. Risk prediction was quantified by the change in the AreaUndertheROCCurve (AUC) and by risk reclassification (Net Reclassification Improvement (NRI) and Integrated Discrimination Improvement (IDI)). Two VLDL lipoprotein metabolism indicators (VLDL E and VLDL H ) improved cardiovascular risk prediction. We added these indicators to a multivariate model with the best performing conventional risk markers. Our method significantly improved both CVD prediction and risk reclassification.  Two calculated VLDL metabolism indicators significantly improved cardiovascular risk prediction. These indicators may help to reduce prescription of unnecessary cholesterollowering medication, reducing costs and possible sideeffects. For clinical application, further validation is required.


Introduction
The Framingham Risk score predicts cardiovascular risk based on six variables: age, diabetes, smoking status, treated and untreated systolic blood pressure, total cholesterol, and HDL cholesterol.(186) Newer lipoprotein measurement methods have attempted to improve risk prediction by quantifying lipoprotein subclasses by size (58,187,188) or density(189) range.
We have developed a computational model to analyze measured lipoprotein subclass profiles in terms of the underlying metabolic activity. (125,126,139,149) The model can calculate ratios of lipoprotein production, lipolysis, and uptake processes from a single lipoprotein profile measurement; we call these ratios 'lipoprotein metabolism indicators'. Because metabolic disorders are at the basis of cardiovascular disease, we hypothesized that adding metabolic information in the form of lipoprotein metabolism indicators to conventional risk factors can improve cardiovascular risk prediction. We evaluated this hypothesis for subjects from the Framingham offspring cohort.


In this study we used measured information from subjects studied in the 4th examination of the Framingham Heart Study Offspring cohort, as recorded in dbGaP.(1) Subjects were included when they had no history of cardiovascular disease, gave written informed consent for general research use, had complete NMR lipoprotein profiles recorded, and had a complete record of conventional cardiovascular risk factors. Cardiovascular events were carefully recorded during the followup period for all subjects.


We applied the Particle Profiler computational model (125,149) to NMR lipoprotein profiles.(4) Profiles were based on the original NMR measurements, to which Liposcience's LP3 algorithm was applied. Slight modifications to the previously published Particle Profiler(139) fitting procedure can be found in the Supplemental Material (Methods). We calculated ratios of all modeled processes (lipoprotein production, total lipoprotein lipolysis, HL lipolysis, LPL lipolysis, liver lipoprotein attachment, liver lipoprotein uptake) in each of three sets of lipoprotein size ranges (VLDL through LDL, VLDL only, IDL through LDL).


Subjects who experienced a general cardiovascular event, as defined by the Framingham Heart Study,(4) within 10 years after the NMR measurements, were designated as 'cases', all others as 'controls'. The Framingham definition includes coronary death, myocardial infarction, coronary insufficiency, angina, ischemic stroke, hemorrhagic stroke, transient ischemic attack, peripheral artery disease, and heart failure.


We used a statistical learning algorithm (a nonlinear L2norm support vector machine (190,191)) to correlate predictor variables with the CVD outcome. We grouped the predictor variables into three datasets: 1. conventional cardiovascular risk parameters, without cholesterol; 2. conventional cholesterol parameters (including NMRderived LDLc, HDLc, and VLDLc) and 3. lipoprotein metabolism indicators. A complete overview of the variables in these sets and a detailed explanation of the procedure we used for constructing the multivariate model is provided in the Supplemental Material (Methods). In summary, in order to obtain a model similar to the Framingham Risk Score, we selected the six most predictive variables from dataset 1, the two most predictive markers from dataset 2, and further markers from dataset 3. In the first phase, using dataset 1, we included 'age' and 'gender' in the model. We then added in succession those variables that contributed most to improving predictive performance of the model, measured as the area under the ROC curve.(128) Such a procedure is frequently referred to as "forward variable selection" (see e.g. (192)). Having selected the biomarkers from dataset 1, we proceeded in a similar manner with datasets 2 and 3, consecutively adding the most predictive variables to the model. We added markers from dataset 3 that gave a substantial improvement in ROC prediction and that were not correlated with markers already in the model (r2<0.25); this procedure led to inclusion of two additional markers from dataset 3. For comparison, we also included a dataset with the selected markers from dataset 1, plus total and HDL cholesterol. We used a separate training and testset for marker selection, but evaluated the final result using the complete dataset. All multivariate analyses were performed using Numerical Python. The multivariate predictions of CVD risk were compared using area under the ROC curve statistics (AUC) with the method by de Long and a binomial exact test, calculated in MedCalc, version 11.5.1.0. We used Platt's algorithm to transform the predictions computed by SVM into class probabilities for computing reclassification statistics. (193,194) Reclassification was quantified using the 'Net Reclassification Improvement' (NRI) using 6% and 20% risk cutoffs for the 'medium' and 'high' risk classes and the 'Integrated Discrimination Improvement' (IDI, a risk cutoffindependent method) as suggested by Pencina.(195)


Of the 2142 selected subjects 145 cases and 1836 controls were found to have a complete record of all relevant parameters and thus were included in the analysis. Baseline characteristics of the subjects are shown in Table 1 we call the 'VLDL Hepatic turnover indicator' or VLDL H , is the average of two ratios: that between hepatic VLDL lipolysis and VLDL production, and that between VLDL attachment to the liver and VLDL production. An explanation of the mathematical notation of these indicators can be found in the Supplemental Material (Methods). Tables 3 and 4 show the results of a ReceiverOperatingCharacteristic (ROC) analysis for general cardiovascular disease. Table 3 displays the area under the curve, its improvement over a predictor drawn at random, and a percentage incremental improvement of the last statistic. Results of the statistical analyses comparing the curves are shown in Table 4. Our method significantly improved CVD prediction over accepted risk markers, as measured by the AreaUnderthe ROCCurve (AUC). The improvement of our model versus a model with classical Framingham risk markers, including total cholesterol and HDLc, was AUC=0.0177 with p=0.0055. The improvement of our model versus a model including LDLc and HDLc was AUC=0.0150 with p=0.0067. In comparison, the model including LDLc and HDLc did not significantly improve risk prediction over the model including total cholesterol and HDLc, with AUC=0.00268, and p=0.6003. As expected, adding total and HDL cholesterol to other classical Framingham risk factors did significantly improve risk prediction, with AUC=0.0354 and p=0.0003. The statistical test thus showed that adding lipoprotein metabolism indicators to a model that includes existing cardiovascular risk factors significantly improved the area under the ROC curve for this population, with respect to conventional risk markers.  Table 5 shows the results of the reclassification analysis. Risk reclassification, using low, middle, and high risk classes, and also using the category independent methods was significantly improved when including LDLc, HDLc, and VLDL metabolism indicators. In addition, we calculated NRI reclassification statistics for subjects classified as at 'Intermediate risk' when using Framingham risk markers (

Discussion
This is the first study in which 'lipoprotein metabolism indicators' have been used for cardiovascular disease risk prediction. These diagnostics are ratios of lipoprotein production, lipolysis, and uptake processes derived from a single lipoprotein profile measurement using computational modelling. We demonstrate that incorporation of two lipoprotein metabolism indicators significantly improves CVD risk prediction as measured by the areaunderthe ROCcurve. Reclassification is also significantly improved over conventional risk markers. The most important predictor, the 'VLDL Extrahepatic lipolysis indicator' or VLDL E ,is a ratio between the VLDL lipolysis rate related to lipoprotein lipase (LPL) and the influx of particles due to production in the liver and lipolysis of larger particles. As LPL mainly acts extrahepatically, this ratio gives information about the capacity of extrahepatic tissue to absorb triglycerides from VLDL particles in the fasting state. The second indicator, we call the 'VLDL Hepatic turnover indicator' or VLDL H , is the average of two ratios: that between hepatic VLDL lipolysis and VLDL production, and that between VLDL attachment to the liver and VLDL production. This combined ratio relates to the capacity of the liver to process VLDL particles, both through lipolysis and particle attachment to the liver. Inspection of the risk model (see Supplemental Material, Results) shows that LDLc remains the most important lipoproteinrelated predictor of CVD events. HDLc is an important risk modifier, especially when no blood pressure medication is used. When using blood pressure medication, VLDL E becomes important; the lower this indicator, the slower incoming VLDL particles are lipolysed extrahepatically, the higher the risk. VLDL H is most important for determining the border between low and medium risk, especially for men and when not using blood pressure medication; the lower VLDL H , the less hepatic VLDL turnover per produced particle, the higher the risk. These interpretations show that the new risk prediction can be understood in relation to lipoprotein pathophysiology and genetic variation (in LPL and other genes pertinent to VLDL processes).
Examining the reclassification of subjects that were classified as at 'intermediate risk' by Framingham risk factors is of special clinical significance. The intermediate risk group consists of those individuals that should be treated according to international guidelines (195). Subjects that are reclassified move to either the high risk (more intensive treatment) or low risk (no treatment) groups. Our results show that a net 25% of subjects in this group that will not get cardiovascular disease after 10 years are moved to the low risk group, preventing them from taking unnecessary medication. The reclassification of people with events to the high risk group was not significant, probably due to the low number of cases in this group (n=48). Extrapolating these results to clinical practice directly is not straightforward, most importantly because treatment decisions are most often made based on one or two parameters (such as LDLc and HDLc) and not based on a complete set of risk markers. However, because our multivariate model for the classical Framingham markers is already an improvement over the twovariable approach used in practice, a 25% improvement using our final risk model will most likely be an underestimate for a comparison with a twovariable approach used in the same population. Future studies will need to point out whether the 25% improvement can be validated in other populations, and whether a population with more CVD cases will also yield significant reclassification improvement for cases in the Intermediate risk category. Our methodology can be readily applied to any past studies in which NMR lipoprotein profiles have been measured. Possible subjects of further investigation includes determining risk in younger or older persons, differences in ethnic groups, and the benefits for secondary prevention. The Particle Profiler model can also derive lipoprotein metabolism indicators from other methods for measuring lipoprotein profiles.(58,187,188) Other future investigation can compare the results of modelling the data from these methods.
The current study has one technical limitation that deserves mention: the NMR spectra were recorded with an older version of the technology that is currently available. This limitation does not affect the method to derive lipoprotein metabolism indicators. Because of newer NMR methodology, the accuracy of lipoprotein metabolism indicators will increase in future studies.
The results of this study are timely, because the methodology to calculate lipoprotein metabolic indicators is recent (first application to NMR data published in June 2012 (139)).
In summary, in a sample of 1981 subjects from the Framingham offspring cohort, we found 2 lipoprotein metabolism indicators that together significantly improved general cardiovascular risk prediction, as quantified by the area under the ROC curve and by reclassification statistics. These indicators may help to reduce the number of people that unnecessarily take cholesterollowering medication, reducing costs and possible sideeffects. Clinical application will require further validation of these findings.


We would like to thank Michael J. Pencina, PhD and Kevin F. Kennedy, MS for making available the computer code that calculates reclassification statistics. We are indebted to all participants, staff, and investigators who made the Framingham Offspring Study possible. We would like to thank Jim Otvos and Liposcience for making available the full lipoprotein profiles necessary for modelling.   Mention of trade names or commercial products in this publication is solely for the purpose of providing specific information and does not imply recommendation or endorsement by the U.S. Department of Agriculture. The USDA is an equal opportunity provider and employer.


In a previous paper, we fitted the Particle Profiler model to NMR data measured in the GOLDN study (139). In the GOLDN study, the NMR data contained three VLDL fractions. In the current study, the NMR data contained six VLDL fractions. Accordingly, we assigned new weights for fitting the data to VLDL fractions VLDL1 through VLDL4. Because of the small number of particles in VLDL fractions 5 and 6, we combined them and fitted them with the original weight for the 'large VLDL' fraction. Supplemental Table 1 gives the weights for the new VLDL fractions, the other weights and the procedure are as previously described (139). indicates the particle size dependent rate of the process denoted by rateprocess, averaged per particle over the size range denoted by range. Where rateprocess is one of the following processes: lpl (LPLrelated lipolysis), hl (HLrelated lipolysis), l (total lipolysis, sum of LPL and HLrelated lipolysis), u,liver (liver uptake), or a,liver (liver attachment); and where range is one of the following size ranges: ILDL (IDL and LDL size range, 530 nm in the model), VLDL (VLDL size range, 3080 nm in the model) or TOT (complete modeled size range, 580 nm in the model). Please note that the model does not include HDL, so these particles are not included in the smaller size range.   , s fluxproces indicates the particle flux denoted by fluxprocess into the size range denoted by range. Where fluxprocess is one of the following processes: prod (direct production from the liver) or in (total influx due to both production and lipolysis of larger particles); and where range is one of the size ranges mentioned above.

   
The input dataset for constructing the multivariate prediction model consisted of all ratios indicated by expressions 1, 2, and 3. Because of the large number of zero entries, indicators with   lpl or   , prod in the denominator were excluded.
In addition, several averages of indicators were constructed, of the form: Where rateprocessA is one of the following processes: lpl, hl, or l; rateprocessB is either a,liver or u,liver. All other symbols are defined as described above. Indicators including   , prod were excluded, because of the large number of zero entries.
These expressions result in 124 lipoprotein metabolism indicators from expressions 1, 2, and 3 and 30 from expression 4, which total 154 lipoprotein metabolism indicators.


To construct a multivariate predictive model we used a stateoftheart statistical machine learning algorithm, called 'support vector machine' (SVM). (190,191) The method belongs to the class of socalled regularized kernel based approaches that have been shown to outperform many standard classification methods. To conduct the experiment we scaled each variable to the range of [0,1] (note that lipoprotein metabolism indicators were then already logtransformed), we shuffled the dataset randomly, and we divided the data into two independent sets, that is, a training set (70%) and a validation set (30%). When training the algorithm there are number of hyperparameters that have to be estimated to ensure good generalization performance. In our experiments we used a SVM algorithm with a squared loss function and a Gaussian kernel. Furthermore, we estimated the regularization parameter to prevent overfitting on the training data. Optimal widths for the kernel function and regularization parameter were found via a crossvalidation procedure on the training set. Once the parameters that led to the best predictive performance of the model were obtained, we retrained the algorithm on the complete training set and tested the performance on the separately reserved validation set. We evaluated predictive performance of the the model using area under ROC curve statistics (AUC) (Cstatistic). (128) We applied the above multivariate modeling approach in order to identify the best biomarkers. We used the validation set to estimate the biomarkers' predictive performance. For this purpose we used a "forwardselection" procedure. (192) In order to obtain a model similar to the Framingham Risk Score, we selected the markers from three consecutive groups. The first group consists of 'classical' markers mentioned in Supplemental Table 2; from this set we selected two markers (age and sex) and let the algorithm identify four more markers. The second group consists of cholesterol markers mentioned in Supplemental Table 2; from this set we let the algorithm identify two markers. The third group consists of the logtransformed lipoprotein metabolism indicators; we let the algorithm select several markers, and decided how many to include based on AUC performance improvement and on lack of correlation with the already included markers (r2<0.25). After the addition of every marker we evaluated the area under the ROC curve on the validation set to evaluate how good the model performs with the set of markers selected so far.
This selection procedure led to the set of markers mentioned in Table 2 of the main article.
In order to test whether we could have performed biomarker identification with a simpler statistical method, we also used logistic regression with the same experimental setup as described above. Supplemental Table 3 shows the markers that were selected from group 1 using logistic regression. These results show that this second method selects four highly correlated blood pressure variables, very unlike the known Framingham Risk Score variables. Our SVMbased method does select variables corresponding to the Framingham Risk Score, indicating that we can trust the SVM method to produce more reliable results in this experimental setup than logistic regression.

          
Age Sex Systolic blood pressure physician 1 Diastolic blood pressure physician 1 Systolic blood pressure physician 2 Diastolic blood pressure physician 2


In the body text, we note that inspection of the risk model shows four points. Below we mention these points and refer to the illustrating figure.
1. LDLc remains the most important lipoproteinrelated predictor of CVD events.
a. Figure 6.1a shows how risk depends on LDLc, for a 60year old male and female subject who do not use blood pressure medication, and have overall average values for all other predictor variables. b.
This figure show that a higher LDLc value leads to a higher risk in our model of the specified subjects.

2.
HDLc is an important risk modifier, especially when no blood pressure medication is used.
a. Figure 6.1b shows how risk depends on HDLc, for a 60year old male and female subject who do not use blood pressure medication, using an LDLc on the lowtomedium risk boundary (LDLc: 130 mg/dL) and on the mediumtohigh risk boundary (LDLc: 190 mg/dL). All other predictor variables have the population averages mentioned above. b.
b. This figure shows that a lower HDLc leads to a higher risk in our model of the specified subjects. Lower LDLc lowers the overall risk. 3.
When using blood pressure medication the VLDL Extrahepatic lipolysis indicator (VLDL E ) becomes important; the lower the VLDL E , the less relative LPL turnover, the higher the risk.
a. Figure 6.1c shows how risk depends on VLDL E , for a 60year old male and female subject, who do use blood pressure medication. All other predictor variables have the population averages mentioned above. b.
The figure shows that in subjects on blood pressure medication, a low VLDL E dramatically increases the CVD risk, especially in men.

4.
The VLDL Hepatic turnover indicator (VLDL H ) is important for determining the border between low and medium risk, especially for men and when not using blood pressure medication; the lower the VLDL H , the less relative hepatic turnover, the higher the risk.
a. Figure 6.1d shows how risk depends on VLDL H for a 60year old male and female subject, who do not use blood pressure medication, and have LDLc on the lowtomedium risk boundary. All other predictor variables have the population averages mentioned above. b.
The figure shows that a low VLDL H increases the CVD risk in these subjects.


Graphs are drawn up for a 60year old male and female subject, who have overall population average risk factors for the other risk factors, except those specified in continuation. For A) CVD risk change with LDLc (mg/dL), subjects do not use blood pressure medication B) CVD risk change with HDLc (mg/dL), with LDLc at the lowmedium risk border, and the mediumhigh risk border, subjects do not use blood pressure medication. C) CVD risk change with the VLDL Extrahepatic lipolysis indicator (expressed as ln(fl/particle)), subjects  use blood pressure medication. D) CVD risk change with the VLDL Hepatic turnover indicator (expressed as ln(fl/particle)), subjects do not use blood pressure medication, and have LDLc at the lowmedium risk border. 