A Controlled Trial of Arthroscopic Surgery for Osteoarthritis of the Knee

J. Bruce Moseley, M.D., Kimberly O'Malley, Ph.D., Nancy J. Petersen, Ph.D., Terri J. Menke, Ph.D., Baruch A. Brody, Ph.D., David H. Kuykendall, Ph.D., John C. Hollingsworth, Dr.P.H., Carol M. Ashton, M.D., M.P.H., and Nelda P. Wray, M.D., M.P.H.

N Engl J Med 2002; 347:81-88July 11, 2002DOI: 10.1056/NEJMoa013259

Abstract
Article
References
Citing Articles (406)
Letters: When medical therapy fails to relieve the pain of osteoarthritis of the knee, arthroscopic lavage or débridement is often recommended. More than 650,000 such procedures are performed each year1at a cost of roughly $5,000 each. In uncontrolled studies of knee arthroscopy for osteoarthritis, about half the patients report relief from pain.2-16 However, the physiological basis for the pain relief is unclear. There is no evidence that arthroscopy cures or arrests the osteoarthritis. Therefore, we conducted a randomized, placebo-controlled trial to assess the efficacy of arthroscopic surgery of the knee in relieving pain and improving function in patients with osteoarthritis. Both patients and assessors of outcome were blinded to the treatment assignments.

METHODS

The college and hospital institutional review board approved the protocol. A data and safety monitoring board monitored the study.

Study Participants

Participants were recruited from the Houston Veterans Affairs Medical Center from October 1995 through September 1998. Patients were eligible if they were 75 years old or younger, had osteoarthritis of the knee as defined by the American College of Rheumatology,17 reported at least moderate knee pain on average (≥4 on a visual-analogue scale ranging from 0 to 10) despite maximal medical treatment for at least six months, and had not undergone arthroscopy of the knee during the previous two years.

The severity of osteoarthritis in the study knee (that with the greatest pain-induced limitation of function) was assessed radiographically and graded on a scale of zero to four.18 The scores for the three compartments were added together to generate a severity grade of 0 to 12. Criteria for exclusion were a severity grade of 9 or higher, severe deformity, and serious medical problems.

All patients provided informed consent, which included writing in their chart, “On entering this study, I realize that I may receive only placebo surgery. I further realize that this means that I will not have surgery on my knee joint. This placebo surgery will not benefit my knee arthritis.” Of the 324 consecutive patients who met the criteria for inclusion, 144 (44 percent) declined to participate. Participants were younger than those who declined to participate (52.3±11.3 years vs. 55.3±12.4 years, P=0.002), were more likely to be white (62.2 percent vs. 50.7 percent, P=0.03), and had more severe arthritis (25.0 percent vs. 12.5 percent with grade 7 or 8 arthritis, P<0.001).

Randomization Process and Treatment Groups

Participants were stratified into three groups according to the severity of osteoarthritis (grade 1, 2, or 3; grade 4, 5, or 6; and grade 7 or 8). A stratified randomization process with fixed blocks of six was used. Sealed, sequentially numbered, stratum-specific envelopes containing treatment assignments were prepared and given to the research assistant. After the patient was in the operating suite, the surgeon was handed the envelope. The treatment assignment was not revealed to the patient.

Participants were randomly assigned to arthroscopic débridement, arthroscopic lavage alone, or the placebo procedure. One orthopedist performed all the operations. Patients in the débridement group or the lavage group received standard general anesthesia with endotracheal intubation. Patients in the placebo group received a short-acting intravenous tranquilizer and an opioid and spontaneously breathed oxygen-enriched air.

Lavage

After diagnostic arthroscopy in patients in the lavage group, the joint was lavaged with at least 10 liters of fluid. Anything that could be flushed out through arthroscopic cannulas was removed. Normally, no instruments were used to mechanically débride or remove tissue. However, if a mechanically important, unstable tear in the meniscus (e.g., a displaced “bucket-handle” tear) was encountered, the torn portion was removed and the remaining meniscus was smoothed to a firm, stable rim. (There is general agreement that it is inappropriate to leave this type of meniscal tear untreated.11,13,19,20) No other débridement was performed.

Débridement

After diagnostic arthroscopy in patients in the débridement group, the joint was lavaged with at least 10 liters of fluid, rough articular cartilage was shaved (chondroplasty was performed), loose debris was removed, all torn or degenerated meniscal fragments were trimmed, and the remaining meniscus was smoothed to a firm and stable rim. No abrasion arthroplasty or microfracture was performed. Typically, bone spurs were not removed, but any spurs from the tibial spine area that blocked full extension were shaved smooth.

Placebo Procedure

To preserve blinding in the event that patients in the placebo group did not have total amnesia, a standard arthroscopic débridement procedure was simulated. After the knee was prepped and draped, three 1-cm incisions were made in the skin. The surgeon asked for all instruments and manipulated the knee as if arthroscopy were being performed. Saline was splashed to simulate the sounds of lavage. No instrument entered the portals for arthroscopy. The patient was kept in the operating room for the amount of time required for a débridement. Patients spent the night after the procedure in the hospital and were cared for by nurses who were unaware of the treatment-group assignment.

Postoperatively, there were two minor complications and no deaths. Incisional erythema developed in one patient, who was given antibiotics. In a second patient, calf swelling developed in the leg that had undergone surgery; venography was negative for thrombosis. In no case did a complication necessitate the breaking of the randomization code.

Postoperative care was delivered according to a protocol specifying that all patients should receive the same walking aids, graduated exercise program, and analgesics. The use of analgesics after surgery was monitored; during the two-year follow-up period, the amount used was similar in the three groups.

End Points

Study personnel who were unaware of the treatment-group assignments performed all postoperative outcome assessments; the operating surgeon did not participate in any way. Data on end points were collected 2 weeks, 6 weeks, 3 months, 6 months, 12 months, 18 months, and 24 months after the procedure. To assess whether patients remained unaware of their treatment-group assignment, they were asked at each follow-up visit to guess which procedure they had undergone. Patients in the placebo group were no more likely than patients in the other two groups to guess that they had undergone a placebo procedure. For example, at two weeks, 13.8 percent of the patients in the placebo group guessed that they had undergone a placebo procedure, and 13.2 percent of the patients in the lavage and débridement groups guessed that they had undergone a placebo procedure.

The primary end point was pain in the study knee 24 months after the intervention, as assessed by a 12-item self-reported Knee-Specific Pain Scale (KSPS) created for this study (see Supplementary Appendix 1, available with the full text of this article at http://www.nejm.org). Scores on this scale range from 0 to 100, with higher scores indicating more severe pain. In addition, to ensure our ability to detect any benefit, we also used five secondary efficacy end points: two additional assessments of pain and three assessments of function at all time points. Arthritis pain in general (i.e., not specifically in the study knee) was assessed by means of the four-item pain subscale of the Arthritis Impact Measurement Scales (AIMS2-P).21,22 Higher scores on this subscale indicate more severe pain. Body pain (i.e., not necessarily from arthritis and not necessarily in the knee) was assessed with the 2-item pain subscale of the Medical Outcomes Study 36-item Short-Form General Health Survey (SF-36-P).23,24 Higher scores on this subscale indicate less severe pain. The AIMS2-P and the SF-36-P scores were transformed into scores on a scale from 0 to 100.

Two self-reported measures of physical function were used: the 5-item walking–bending subscale from the AIMS2 (AIMS2-WB, transformed into scores on a scale from 0 to 100, with higher scores indicating more limited function21,22) and the 10-item physical-function subscale from the SF-36 (SF-36-PF, transformed into scores on a scale from 0 to 100, with higher scores indicating better function23,24). As an objective measure, we devised the Physical Functioning Scale (PFS) to record the amount of time in seconds that a patient required to walk 30 m (100 ft) and to climb up and down a flight of stairs as quickly as possible. Longer times indicate poorer functioning.

All six outcome scales had good reliability. The median Cronbach's alpha (according to analyses of data from eight time points for all scales) exceeded 0.80. Results for all the outcome measures at all the time points that are not reported here are summarized in Supplementary Appendix 2, available with the full text of this article at http://www.nejm.org.

Statistical Analysis

Our pilot study indicated that it would be feasible to recruit 60 patients per year. The trial was designed to have 90 percent power, with a two-sided type I error of 0.04, to detect a moderate effect size (0.55) between the placebo group and the combined arthroscopic-treatment groups in terms of body pain as measured by the SF-36-P at two years, with an enrollment of 180 patients and 16 or fewer lost to follow-up (i.e., 164 or more completing the two-year follow-up). The primary hypothesis was that the patients in two arthroscopic-intervention groups combined would report the same amount of knee pain at two years as the patients assigned to the placebo group. All statistical tests compared the treatment groups in terms of the values at each visit rather than analyzing the changes from base line. (Scores for these changes [“change scores”] were analyzed, with results that did not differ from the results presented here.) The data and safety monitoring board reviewed interim data 15 months and 24 months after enrollment began, using the Haybittle–Peto group-sequential method, with stopping boundaries of P=0.001 for the two interim analyses.25,26 All reported P values are two-sided and have not been adjusted for multiple comparisons.

Our prespecified analytic strategy was to test, at all time points, for the superiority of the arthroscopic procedures over the placebo procedure. Lacking evidence of superiority, we tested for evidence that the arthroscopic procedures were equivalent27-29 to the placebo procedure by determining the extent to which the study was powered to reject the hypothesis that the arthroscopic treatments caused a small but clinically important improvement (the “minimal important difference”). The minimal important difference for a scale is the smallest change score associated with a patient's perception of a change in health status,30 but it can vary somewhat according to the method of calculation and the study sample.31,32 Minimal important differences for each of the six study scales were calculated on the basis of the trial data by two different methods: the change ratings of patients (their scores on a single-item scale that asked patients if their condition was the same, somewhat better [or worse], or much better [or worse] than before surgery) and the standard error of measurement (the SD of the instrument multiplied by the square root of one minus its reliability coefficient).30-32 Estimates were also obtained from the literature.31-33 For each scale, we tested the hypothesis that the placebo procedure was equivalent to the arthroscopic procedures, using as the minimal important difference the midpoint of the range of the minimal important differences reported in the literature or calculated on the basis of our data. If the 95 percent confidence interval around the estimated size of the effect does not include the minimal important difference, one can reject the hypothesis that the arthroscopic procedures have a small but clinically important benefit.27

RESULTS

A total of 180 patients underwent randomization; 60 were assigned to the placebo group, 61 to the lavage group, and 59 to the débridement group. Base-line characteristics were similar in the three study groups (Table 1TABLE 1Base-Line Characteristics of the Randomized Patients.).

At no point did either arthroscopic-intervention group have greater pain relief than the placebo group (Figure 1FIGURE 1Mean Values (and 95 Percent Confidence Intervals) on the Knee-Specific Pain Scale., Table 2TABLE 2Scores on the Pain Subscale of the Arthritis Impact Measurement Scales., and Supplementary Appendix 2). For example, there was no difference in knee pain between the placebo group and either the lavage group or the débridement group at one year (mean [±SD] KSPS scores, 48.9±21.9, 54.8±19.8, and 51.7±22.4, respectively; P=0.14 for the comparison with the lavage group, and P=0.51 for the comparison with the débridement group) or at two years (mean KSPS scores, 51.6±23.7, 53.7±23.7, and 51.4±23.2, respectively; P=0.64 and P=0.96, respectively). Similarly, there was no significant difference in arthritis pain between the placebo group and the lavage group or the débridement group at one or two years (Table 2).

Furthermore, at no time point did either arthroscopic-intervention group have significantly greater improvement in function than the placebo group (Figure 2FIGURE 2Mean Values (and 95 Percent Confidence Intervals) on the Walking–Bending Subscale of the Arthritis Impact Measurement Scales (AIMS2)., Table 3TABLE 3Scores on the Physical Functioning Scale., and Supplementary Appendix 2). For example, there was no significant difference between the placebo group and either the lavage group or the débridement group in the self-reported ability to walk and bend at one year (mean AIMS2-WB scores, 49.4±25.5, 49.6±29.1, and 56.4±28.4, respectively; P=0.98 for the comparison with the lavage group, and P=0.19 for the comparison with the débridement group) or at two years (mean AIMS2-WB score, 53.8±27.5, 51.1±28.3, and 56.4±29.4, respectively; P=0.61 and P=0.64, respectively). Indeed, objectively measured walking and stair climbing were poorer in the débridement group than in the placebo group at two weeks (mean PFS score, 56.0±21.8 vs. 48.3±13.4; P=0.02) and one year (mean PFS score, 52.5±20.3 vs. 45.6±10.2; P=0.04) and showed a trend toward worse functioning at two years (mean PFS score, 52.6±16.4 vs. 47.7±12.0; P=0.11) (Table 3).

Lacking evidence of the superiority of the arthroscopic treatments over the placebo procedure in relieving pain or improving function, we considered whether the 95 percent confidence intervals for the differences in outcome between each arthroscopic procedure and the placebo procedure included clinically important differences. The minimal important differences used for this evaluation were as follows: a difference of 13.5 points on the KSPS, 10.0 on the AIMS2-P, 11.8 on the SF-36-P, 12.8 on the AIMS2-WB, 11.3 on the SF-36-PF, and 4.5 on the PFS. At almost all time points during follow-up (72 of 84 comparisons), the confidence intervals excluded these minimal important differences.

DISCUSSION

This study provides strong evidence that arthroscopic lavage with or without débridement is not better than and appears to be equivalent to a placebo procedure in improving knee pain and self-reported function. Indeed, at some points during follow-up, objective function was significantly worse in the débridement group than in the placebo group.

Arthroscopy is the most commonly performed type of orthopedic surgery, and the knee is by far the most common joint on which it is performed.1 Numerous uncontrolled, retrospective case series have reported substantial pain relief after arthroscopic lavage or arthroscopic débridement for osteoarthritis of the knee.2-16 In the only previous double-blind, randomized, controlled trial of knee arthroscopy of which we are aware,34 patients with minimal osteoarthritis as assessed by radiography were assigned to undergo arthroscopic lavage with either 3000 ml of fluid (treatment) or 250 ml of fluid (control) and were followed for one year. Both the treatment and the control groups reported improvement in function at 12 months, and although the report interprets the study as having proved the efficacy of lavage, there was no statistically significant difference between the groups in terms of the primary outcome at any point during follow-up.

To explain the improvement that has been reported after these procedures, some have proposed that the fluid that is flushed through the knee during arthroscopy cleanses the knee of painful debris and inflammatory enzymes.4,6,9,15,16,34 Others have suggested that the improvement is due to the removal of flaps of articular cartilage, torn meniscal fragments, hypertrophied synovium, and loose debris.2-14 However, our study found that outcomes after arthroscopic treatment are no better than those after a placebo procedure. This lack of difference suggests that the improvement is not due to any intrinsic efficacy of the procedures. Although patients in the placebo groups of randomized trials frequently have improvement, it may be attributable to either the natural history of the condition or some independent effect of the placebo.

Because we found no evidence that lavage or débridement is superior to a placebo procedure, the question arises whether these arthroscopic procedures could have small but clinically important benefits that we missed because of our limited sample size. To evaluate this possibility, we determined the size of the clinical benefit that the trial was able to rule out, using the minimal important difference for each of our scales. Because estimates of minimal important differences based on different samples and different methods do not yield the same values, we used the midpoint of the range of available minimal important differences in order to test our hypothesis about the equivalence of the three procedures. For the great majority of comparisons, the 95 percent confidence intervals did not contain the minimal important difference, indicating that there was not a clinically important improvement that the study had simply failed to detect.

One surgeon performed all the procedures in this study. Consequently, his technical proficiency is critical to the generalizability of our findings. Our study surgeon is board-certified, is fellowship-trained in arthroscopy and sports medicine, and has been in practice for 10 years in an academic medical center. He is currently the orthopedic surgeon for a National Basketball Association team and was the physician for the men's and women's U.S. Olympic basketball teams in 1996.

The principal limitation of this study is that our participants may not be representative of all candidates for arthroscopic treatment of osteoarthritis of the knee. Almost all participants were men, because the study was conducted at a Veterans Affairs medical center. We do not know whether our findings may be generalized to women, although uncontrolled studies do not indicate that there are differences between the sexes in responses to arthroscopic procedures.8,10,13 A selection bias might have been introduced by the fact that 44 percent of the eligible patients declined to participate in the study. We believe this high rate of refusal to participate resulted from the fact that all patients knew they had a one-in-three chance of undergoing a placebo procedure. Patients who agreed to participate might have been so sure that an arthroscopic procedure would help that they were willing to take a one-in-three chance of undergoing the placebo procedure. Such patients might have had higher expectations of benefit or been more susceptible to a placebo effect than those who chose not to participate.

If the efficacy of arthroscopic lavage or débridement in patients with osteoarthritis of the knee is no greater than that of placebo surgery, the billions of dollars spent on such procedures annually might be put to better use. This study has also shown the great potential for a placebo effect with surgery, although it is unclear whether this effect is due solely to the natural history of the condition or whether there is some independent effect. Researchers should reconsider the best ways of testing the efficacy of surgical procedures performed purely for the improvement of symptoms. In the debate about placebo-controlled trials of surgery, the critical ethical considerations surround the choice of the placebo. Finally, health care researchers should not underestimate the placebo effect, regardless of its mechanism.35

Supported by a grant from the Department of Veterans Affairs.

SOURCE INFORMATION

From the Houston Veterans Affairs Medical Center (J.B.M., K.O., N.J.P., T.J.M., D.H.K., C.M.A., N.P.W.); the Department of Orthopedic Surgery (J.B.M.), the Department of Medicine, Section of Health Services Research (K.O., N.J.P., T.J.M., C.M.A., N.P.W.), and the Center for Medical Ethics and Health Policy (B.A.B.), Baylor College of Medicine; and International Survey Research (D.H.K.) — all in Houston; and the Laguna Honda Hospital, San Francisco (J.C.H.).

Address reprint requests to Dr. Wray at the Section of Health Services Research, Baylor College of Medicine, 2002 Holcombe Blvd. (M.R. 152), Houston, TX 77030, or at nwray@bcm.tmc.edu.