At TheNNT.com, we believe that the traditional peer review process, while effective in some ways, has led to important problems in the way evidence is presented, interpreted, and implemented.
In particular, the methods of results reporting in peer-reviewed medical journals have not been consistent, clear, or understandable.
TheNNT.com is designed to address this gap.
Our editorial board consists of a group of practicing clinicians and one patient advocate. The editor-in-chief is Dr. Shahriar Zehtabchi.
Our process for generating NNT reviews is as follows:
- Question. We begin with a patient-oriented question about a medical intervention, e.g. "Do blood thinners save lives for patients with blood clots?"
- Author assignment. Once the question has been formatted properly and established as relevant and valuable through editorial board review, an author is chosen. This is by invitation, though we do consider applications for authorship through the website by clicking Contact Us.
- Literature search. Following author designation, the author conducts a literature search. Preferences in source material are described below. Briefly, we search pre-appraised EBM resources first, and in the absence of preappraised resources we perform original literature searches. Systematic reviews are preferred though in some cases we accept randomized trial data based on a lack of any higher levels of evidence. Source material, once established, is reviewed by the EIC.
- Evidence review. If the source material is felt to be of adequate quality, or if the clinical question is broadly applicable and important, the review of data is performed. Authors are expected to scrutinize systematic reviews, though not always expected to recheck transcription or extraction from root data sources. In questionable cases root data are reviewed by hand. NNT’s are generally derived from binary outcome data. Outcome measures that are not binary are not amenable to NNT calculations except in select cases where the data can be dichotomized (this will always be explicit).
- Statistical review. NNT calculations are reviewed by the editorial board, and all final mathematical statements of benefit or harm, as well as all color designations (green, yellow, red) are crafted via consensus.
- Final review. Final review and editing is performed by the EIC for all NNT’s, and consensus resolution is achieved for all disagreements.
Authorship is voluntary, and authors must have no financial conflicts of interest for their subject matter.
At theNNT.com, our preference for levels of evidence is consistent with most EBM reference materials. We prefer systematic reviews (SRs) as the highest level of evidence. We will accept randomized controlled trials (RCTs) in some circumstances. We will rarely accept RCTs unless they have been addressed and reviewed by the DARE or ACP Journal Club, two established Evidence-Based Medicine groups that methodically and regularly conduct evidence filtering and evidence reviews of high quality data.
A note on systematic reviews: we prefer the Cochrane Collaboration topic reviews because of their more comprehensive nature and their multi-disciplinary approach. When we accept and use data from a Cochrane (or any other systematic) review it is based on our own review of the review. There are SRs and Cochrane SRs that we have found to be flawed and/or inapplicable and in these cases we will not accept their data review as adequate for the purpose of conveying patient-oriented, accurate, reliable estimates of the impact of an intervention. Occasionally we will therefore choose to report data from a non-Cochrane SR or from RCTs even when there is a Cochrane review available. We will describe in detail the reasons for this choice whenever we do so.
A related note on meta-analyses: the term ‘meta-analysis’ simply means the analysis of a combination of data from multiple studies. SRs generally, where appropriate, include a meta-analysis in them. The difference is that a SR systematically reviews the literature for a given topic and ostensibly finds and uses all of the available literature for their meta-analysis. One major potential problem with meta-analyses is that without the systematic attempt to include all relevant literature and data it is possible to cherry-pick the studies included in the review. When only studies that appear to have shown benefit for a treatment, or only studies that appear to have shown no benefit for the treatment, are selected to be in a meta-analysis, then the reader is only getting a partial picture of the available data. Therefore SRs are the only form of topic review that we will accept. We will not accept meta-analyses without a demonstration of a systematic literature search and unbiased selection process as proof of effect at theNNT.com.
We will not accept lesser designs (cohort studies, case-control studies, case series, etc.) as proof of cause and effect for the impact of an intervention, therefore will not report the NNT of an intervention from these study designs. The single exception to this is our general acceptance and use of observational and post-marketing surveillance data for adverse effects, which we explain below in the section ‘Adverse Effects Data’.
Industry Sponsored Trial Data
Industry sponsored data may be robust and valid. However, at NNT we are increasingly wary of the validity of industry sponsored data. The use of ghost writing and medical writing companies, and the use of contract research organizations contribute to our wariness. Those who draw their income from industry (including medical writing organizations and CRO’s) are beholden to industry in ways that may easily impact their work (Blumenthal, 2004). This may include the tone, presentation, and interpretation of published data and has the potential to mislead.
In addition, it has become clear that selective publication of data from industry has the potential to significantly affect the perception of an intervention’s impact, and may also alter the mathematical estimate of effect to which physicians and patients are exposed in topic reviews and meta-analyses (Turner, 2008).
Finally, there are known and prominent examples, largely born of litigation and documents made available to the public via subpoena, of what appear to be fraudulent and openly misleading reporting of data from industry sponsored data (Psaty, 2008; Curfman, 2000).
Together, the above considerations lead us to be generally suspicious of industry sponsored data, particularly that which appear to demonstrate a benefit of a potentially lucrative intervention.
Generally speaking there is no need to use composite endpoints. In many cases the use of composite endpoints (e.g. ‘death, MI, or revascularization’ in studies of coronary treatments) obfuscates the patient-oriented utility of an intervention (Tomlinson, 2010). In addition, composite endpoints are very often made up of components with considerably varying patient-interest. At TheNNT our sense is that separating these endpoints into separate outcomes helps to clarify the degree to which an intervention may be beneficial based on the value system of an individual patient. We therefore separate all composite endpoints into their constituent parts for NNT calculations.
Adverse Effects Data
When interventions are evaluated in RCTs they are compared to a control group. Very often that control group includes either a placebo or sham treatment. When an intervention is no better than placebo we consider the intervention to have demonstrated no benefit. This means that we negate the impact of placebo, even in cases where it is clear from the data that there is indeed an apparent benefit to the placebo or sham treatment. Similarly we do not accept the potential benefit of uncontrolled observational data partly because we presume that there may be placebo effects at work rather than effects attributable solely to the intervention.
We consider this decision appropriate and patient-oriented: patients would generally not want to be subjected to interventions if they were aware that the intervention was no better than a placebo or sham. This is not true, however, in the case of adverse effects (i.e. side effects or harms) of interventions. In the case of side effects, which are also known to occur in many patients taking placebos, we believe that observational data are indeed adequate to detect important effects. This is a conscious decision on our part based on our presumption that while patients would not want interventions that are no better than placebo, they would want to avoid potential adverse effects or harms whether they would be found in placebo or intervention groups or both. Any adverse effect, we presume, regardless of its etiology, is an effect that patients would prefer to avoid. This also helps to explain how and why it is that uncontrolled observational post-marketing surveillance data, often designed and specifically solicited by the FDA after their approval of new medications, are the primary method used to determine the rate and nature of major side effects for those medications.
Therefore we accept uncontrolled observational data as potentially establishing adverse event rates, while we do not accept uncontrolled observational data as establishing potential benefits of interventions.
NNT and the Thorny Issue of Control Event Rates
The NNT calculation relies on a conversion from the absolute risk reduction (the raw difference between the rate of primary outcomes in the treatment group and the rate of primary outcomes in the control group). As described above, we believe in the utility and importance of this measure because it offers information about everyone in the study groups, both those who did not experience a primary outcome such as death or heart attack and those who did. This is therefore a patient-oriented measure, since patients will want to know the overall likelihood of benefit from an intervention, and both of these pieces of information are necessary.
However, this approach also presents a problem for projecting the likelihood of benefit for an individual patient. Each individual patient has a specific risk of incurring an outcome (e.g. death or heart attack) that is targeted for prevention, and this baseline risk varies considerably. For example, the risk of experiencing death from cardiovascular causes is very small among those who have not had a heart attack or heart problems previously. It is much higher among those with heart disease. When statins are used in the former population there is no identifiable mortality benefit though there is a small, identifiable mortality benefit among the latter group. It is often argued that the baseline risk of these two populations is the primary reason for this difference. After all, when very few people die of heart disease in a group, it is difficult for any intervention to further reduce deaths from heart disease.
This type of variability in risk for each patient will lead to some degree of error when data from multiple studies are pooled together, because each study will have enrolled a sample of subjects whose rate of outcome events in the control group is different (the Control Event Rate, or ‘CER'). If one were to take this theory very seriously, studies of statins among those with heart disease and studies of statins among those without heart disease would not be pooled together in a single review or meta-analysis of studies. In practice this is done somewhat regularly. This is often addressed in these reviews and meta-analyses by reporting only relative risk reductions rather than absolute risk reductions, because relative risk reductions are believed to be portable and consistent across different CER's. In this way the variations in CER are theoretically neutralized as a source of potential error and variation.
There are problems with this approach as well, however. First, it is not always true that different CER's will experience identical or even similar relative risk reductions. This is a leap of faith that depends upon a consistent spectrum of disease and there are instances in which this has not reliably or accurately predicted outcomes from one CER to the next. Second, the philosophical reason for using relative risk reductions rather than absolute risk reductions is that pooling of data introduces variability that may challenge the validity of results. After all, pooling presumes a degree of similarity between datasets (enough similarity to make the data validly combined to begin with), and the degree to which this similarity should be present remains controversial. Indeed removing the variations in the size of the CER may remove some degree of variation in the resulting data, however the data are being pooled in virtually all other ways. Variations in sex, comorbidities, baseline state of health, and severity of disease may be the true drivers of a given patient's potential for benefit, and pooling patients together with these tremendous variations in risk factors may be significantly more important, and more detrimental therefore to validity, than pooling together their observed CER's.
Therefore there is uncertainty in the results of pooling data in any fashion. To reject one form of pooling while accepting many others is questionable, and it is not clear to us that this approach is more valid. In current practice pooling of data essentially averages out risk in almost every dimension, and the removal of CER's from this is of questionable utility in mitigating that risk. Taking large groups of humans and representing their responses to therapy by accepting a single, averaged number, and reporting this, is fraught with potential error, but it is a necessary reality of most research. We at the TheNNT.com believe that this potential error is not substantively neutralized by removing the pooling of CER's and leaving in tact the pooled results of all other variables. Therefore we accept NNT's as a measure that is likely to be as valid as any other measure, and we recognize that variations in CER will also be pooled within these NNT estimates. In our estimation the error that is introduced by the pooling of CER's to generate an NNT is not less important than the error of utilizing relative risk reductions that by their nature must remove the great majority of research subjects from their estimate of effect.
However we also recognize that in an ideal world an individualized risk assessment would be available to all patients in all scenarios, and a relative risk reduction would be applied to this presumed level of risk (i.e. the forecasted CER). In other words to some degree we agree with an approach that utilizes relative risk reductions rather than absolute risk reductions, and we believe that relative risk reductions are likely to be somewhat more accurate measures of responses to therapy. But this will only be reliably more accurate when accurate, validated individualized risk assessments are available to patients looking to make best guesses about the potential benefits of a therapy. When reliable individualized risk assessments are available at the bedside these may, in select situations, be coupled with relative risk numbers to generate a more accurate and useful estimate of the likelihood of benefit for a given patient. Indeed, we hope to be able to generate interactive risk calculators for just such possibilities in the few scenarios where individualized risk assessments are valid and possible.
Finally, CER's are, in the overwhelming majority of trials examining those disease states, much smaller than Control Non-event Rates (CNR's). In other words in the great majority of disease states, the number of subjects who do not experience the outcome of interest is far higher than the number that does. In practical effect this means that when we find that a given intervention's NNT is 25, for instance, the absolute effect of the intervention was 4%. If we found that this was an error and the actual NNT was 1000, this would mean the effect is 0.1%. The absolute difference between these two effects is 3.9%. While this may seem like a substantial difference, it also means that for this intervention the potential range of effects is <4%. At the same time, at least 96% of subjects in studies experienced no identifiable effect due to the intervention. Therefore the expression of an NNT of 25 or an NNT of 1000 allows the patient to understand that in either case, as we report in ‘In Summary' portion that leads every review, the great majority of patients will see no effect. In these two examples either 96% will see no effect or else 99.9% will see no effect.
The importance of this discovery is that the absolute change in an intervention's efficacy because of shifting CER's virtually never even begins to approach the absolute importance or magnitude of the CNR's for the same intervention. When we understand this, we understand what the NNT is particularly focused on: communicating the true chances of seeing benefit from an intervention. In terms of communicating truth, if we were to shift to relative risks this would mean, by definition, ignoring the great majority of subjects from trials by focusing on those who had an event (i.e. by focusing on CER's). Therefore the omission of NNT as a communication device on the basis that it may lead to errors in perception of effect because the NNT may not be perfectly accurate (which is a real risk) ignores the much larger error in perception that is guaranteed when NNT is not used. Therefore on a comparative scale the use of even a potentially flawed NNT is far more correct than the use of communication that is based on relative risks. The error in dropping or not integrating CNR's is, by its very nature, a much more powerful and statistically greater error than any error that could be generated by an inaccurate NNT.
Despite this, we believe that in most cases the NNT's we report are a reasonably average that will apply well to the majority of patients, though because patients admitted into trials are often more ill than others it may be that we are over-estimating effects somewhat. For the time being, we therefore accept that there is controversy over the use of absolute risk reductions and NNT's when pooling data, we believe that if validated individualized risks and accurate relative risks are available then these are optimal (and we will be adding them in cases when they exist), and we continue to stand by our ‘disclaimer' tab on all review pages that explains the existence of variability and of an expected degree of error. Ultimately, we believe that the disadvantages to patients and doctors of using these measures are far eclipsed by the advantages.
Resources and Footnotes
Blumenthal D. Doctors and drug companies. N Engl J Med 2004;351:1885-1890.
Turner EH, et al. Selective publication of antidepressant trials and its influence on apparent efficacy. N Engl J Med 2008;358:252-260.
Psaty BM, Kronmal RA. Reporting mortality findings in trials of rofecoxib for Alzheimer disease or cognitive impairment: a case study based on documents from rofecoxib litigation. JAMA. 2008;299(15):1813-1817.
Curfman GD, et al. Expression of concern: Bombardier et al., "Comparison of Upper Gastrointestinal Toxicity of Rofecoxib and Naproxen in Patients with Rheumatoid Arthritis," N Engl J Med 2000;343:1520-8. N Engl J Med 2005;353:2813-2814.