Data Extraction

One reviewer screened the citations identified by the electronic searches and excluded the majority of citations on the basis of information provided in the title and/or abstract. Citations that appeared to be relevant or those that could not be excluded unequivocally from the title and abstract were identified, and two reviewers reviewed the corresponding full-text reports. Any disagreement between them was resolved by reviewer consensus. From the included articles, the following data were extracted: patient demographics/characteristics; surgical characteristics (including preoperative symptoms, procedure performed, and number of levels); follow-up; and safety outcomes (►Table 1). Study Quality

Case series • Any case-series design a Outcome assessment is independent of judgment of healthcare personnels. Reliable data are data such as mortality or reoperation. b Authors must provide a description of robust baseline characteristics and control for those that are unequally distributed between treatment groups.
Evidence-Based Spine-Care Journal

Determination of Overall Strength of Evidence
After individual article evaluation, the overall body of evidence with respect to each outcome is determined based on precepts outlined by the GRADE Working Group 1 and recommendations made by the Agency for Healthcare Research and Quality (AHRQ). 5 Qualitative analysis is performed considering the following AHRQ requirements and additional domains. 4 ►Table 3 provides an outline of the method used to determine the final SoE.
• Risk of bias is evaluated during the individual study evaluation described earlier. After individual article review, the literature evidence was rated as "HIGH" initially if the majority of the articles are Level I or II. It is rated as "LOW" if the majority were Level III or lower. This is the "baseline" strength of evidence (►Table 4, Evidence Summary). The consistency, directness, precision, and subgroup effects are considered for potential "downgrading" the strength of the body of evidence (one or two levels depending on the degree and number of domain violations).

Criteria Evaluated for "Downgrading"
• Consistency refers to the degree of similarity in the effect sizes of different studies within an evidence base. If effect sizes indicate the same direction of effect and if the range of effect sizes is narrow, an evidence base was judged to be consistent. If meta-analyses were conducted, we evaluated the consistency with an "eyeball test." This test consists of a visual appraisal of the forest plots by two independent reviewers. Single study evidence bases were judged "consistency unknown (single study)" and downgraded. • Directness is concerned with whether the evidence being assessed reflected a single, direct link between the inter-ventions of interest and the ultimate health outcome; that is, a determination of whether the most clinically relevant outcome was measured or a surrogate outcome was assessed. Directness also applies to indirect comparisons of treatment when head-to-head comparisons of interest could not be made within individual studies. • Precision of evidence pertains to the degree of certainty surrounding an estimate of effect for a specific outcome. This is based on whether the estimate of effect reached statistical significance and/or the inspection of confidence intervals around effect estimates. When there are only two subgroups, the overlap of the confidence intervals of the summary estimates of the two groups is considered. No overlap of the confidence intervals indicates statistical significance, but the confidence intervals can overlap to a small degree and the difference still is statistically significant. Sample size across studies is also evaluated. • Subgroup effects, that is, heterogeneity of treatment effects can be evaluated by downgrading if the authors do not state a priori their plan to perform subgroup analyses and if there was no test for interaction.

Criteria Used for "Upgrading"
• Finally, if the strength of evidence is less than "HIGH," we "upgrade" the evidence if there is a dose-response association or a strong magnitude of effect.

Strength of Evidence for Existing Systematic Reviews
CoE ratings for Cochrane and other systematic reviews are assigned a baseline score of HIGH if randomized controlled trials (RCTs) were used and LOW if observational studies were used. The rating can be upgraded or downgraded, based on adherence to the core criteria for methods and qualitative and quantitative analyses for systematic reviews (there is a reference/evaluation table for this). Evidence-Based Spine-Care Journal The following four possible levels and their definition are reported: • High: High confidence that the evidence reflects the true effect. Further research is very unlikely to change our confidence in the estimate of effect.  Other complications Insufficient All five CoE III studies reported on risks of other complications, including vocal paralysis, syncope, cerebrospinal fluid (CSF) leak, and airway swelling, but there was no overlap in these additional outcomes across studies. The complication risks were very low in both inpatient and outpatient groups, with the exception of hospital readmission, which in a single CoE III study was higher in the inpatient group (7%) compared with the outpatient group (0%). All studies had small sample sizes ($50 patients per treatment group), with the exception of one study, 6 which included 97 outpatients and 578 inpatients.
Low No (1) Imprecision b Abbreviations: CoE, Class of Evidence; CSF, cerebral spinal leak. a Consistency of results is unknown as it is based on a single study. b Downgraded for imprecision due to small sample sizes and low event rates.
Evidence-Based Spine-Care Journal