Inside Story

Making medicine count

Working out whether a treatment works, and for how many people, is trickier than it sounds, writes Frank Bowden. Here’s how you should go about doing it

Frank Bowden 13 April 2016 3519 words

Doctors may deny that they unconsciously bias the results of studies, but the evidence shows they do. pixdeluxe/iStockphoto

When I had been a doctor for ten years, I enrolled in a postgraduate course in epidemiology and biostatistics. It was a time when my brain could still absorb information like a damp Wettex, and I found the subject matter engrossing.

The course was based on the work of a doctor at Canada’s McMaster University, David Sackett, who created the discipline known as clinical epidemiology in the 1970s. Sackett had moved to Oxford University and renamed his creation evidence-based medicine, or EBM. Back in clinical practice he was disturbed to find that so much medical decision-making was unstructured and not based on a proper analysis of research findings.

Like all champions of new movements, some of the proponents of EBM had a zealous air about them. It was not surprising that their born-again enthusiasm annoyed the proletariat of “humble” practising clinicians, but I was bemused to find that some parts of the academic medical world were also miffed. Many thought EBM was a tautology: how could the practice of medicine be anything but based on evidence? The Lancet published a snooty editorial in 1995 that grudgingly acknowledged the emergence of the movement. It was titled “Evidence-Based Medicine – In Its Place.”

During my studies I discovered that EBM held answers to many of the problems I had identified in my work. I began to see that I didn’t really know how to judge if the drugs I was prescribing were effective or how to interpret the tests I ordered. Previously I had relied on the wisdom of my elders, but now I saw that there was a relatively simple way to transform my half-baked medical intuitions into something more concrete and scientific. EBM was about turning qualitative notions into semi-quantitative conclusions, I learnt, so you had to be prepared to work with a few simple formulae.

One of those formulae would allow me to calculate a number that I believe should be known to every doctor and every patient. Indeed, it would become my favourite of all numbers.

The gold standard of EBM is the randomised clinical trial. These studies might be expensive, time-consuming or simply impractical, but they are the only way we can confidently determine the effectiveness and safety of a medical intervention. To show how these trials work, I am going to take you through the steps involved in designing a study to assess the effect of a new beta-blocker on the rate of deaths in the years after a heart attack.

In 1964, Sir James Black, working at the pharmaceutical company ICI in England, developed the first beta-blocker, propranolol. Beta-blockers can be used to treat high blood pressure, heart failure, irregular cardiac rhythms and anxiety. Within a few years an etymologically fitting landmark was reached when propranolol became ICI’s first blockbuster drug. Over the next decade, dozens of analogues of propranolol appeared on the market with names all ending in -olol: practolol, metoprolol, timolol, atenolol and pindolol to name just a few. The flood of new examples prompted one of my friends to suggest that the next new beta-blocker would be called olololol.

Olololol, I am disappointed to say, has never appeared on the market, but I am going to use it in this demonstration. I will compare olololol’s effectiveness in preventing death in the years after a heart attack with that of a placebo (an inactive substance). (It is important to note that the numbers I present here are invented to illustrate a point – they should not be seen to reflect the true benefit of beta-blockers in any setting.)

Let’s pretend it is 1980 – a time when only sailors and Volvo drivers grew beards and the benefit of using beta-blockers after a heart attack had not been established. It would therefore be ethical to test olololol against a placebo rather than against another beta-blocker. The first step is to create two groups – one that will receive olololol, known as the treatment group, and another that will receive the placebo, known as the control group.

The people in the control group must be similar to those in the olololol treatment group according to a wide range of characteristics, such as sex, age, family history of heart disease, previous heart attack, smoking, blood fat levels and diabetes. The people in the control group must also have suffered a similar severity of heart attack as those who will receive olololol.

The control group acts a benchmark for comparison with the treatment group. Without a control group you are simply performing an observational study. But even if you do have a control group you can still render the results of a trial invalid if you do not consider a phenomenon known as selection bias. Doctors are, by and large, a hopeful lot. They want to believe in new treatments for serious conditions, and if they think a new drug will help their patients, they will be keen to enrol those patients in a study that offers access to the drug. Conversely, if they don’t believe in the treatment on offer in the trial, they may exclude the sickest patients from receiving the new treatment but include patients who aren’t so sick.

Doctors may deny that they unconsciously bias the results of studies, but the evidence shows they do. This can make a new drug or operation look better or worse than it really is, depending on the attitudes of the doctors enrolling patients in the study. The only way to minimise selection bias is to allocate patients randomly to the treatment or control group and for the researchers and/or referring doctors to have no control over the process. At the end of the study, if the randomisation process has been undertaken correctly, the control and treatment groups will be, on average, almost exactly the same.

The statistician involved in the olololol study has calculated that we need about 2000 patients in the study to prove with 95 per cent confidence that any difference we find between the two groups will be a real difference and not one that arose by chance. That means we have 1000 patients who will receive olololol after their heart attack and 1000 who will receive a placebo pill. If our randomisation process works, the only difference between the two groups will be that one received olololol and the other didn’t.

The next problem to overcome is error in the measurement of the trial outcomes. During the trials, staff will assess the participants’ progress and interpret the results of things such as blood tests, ECGs and X-rays. You would think that something as obvious as a heart attack would be easy to diagnose, but it is actually very complex. Different doctors in different hospitals will give a different interpretation of the same tests. We reduce this variation by having clear criteria for what is, or isn’t, a heart attack, and those conducting the trial will receive training so that a nurse in Perth makes much the same interpretation of an ECG as a doctor in Dapto would.

But another bias may come into play here. The doctor who believes in the new treatment being offered in the randomised clinical trial may unconsciously bias their adjudication of the data in favour of fewer heart attacks in the olololol group and more in the placebo group. The opposite may occur in the doctors who don’t believe that olololol is going to work. To minimise this bias, you have to blind the group allocation so that the person who decides if the participants have had a heart attack does not know which group they belong to. This is relatively easy in some studies, but almost impossible in others.

In our olololol study you would think that it would be quite simple to blind the observers to the allocation group. It is easy to make two pills look and taste the same, but one of the predictable effects of beta-blockers is that they slow the heart rate. It is possible that the observers would assume that patients with slower than normal heart rates were on the olololol and bias their measurements according to their unconscious prejudice about the drug.

The patients should also be blind to the group to which they have been allocated, to account for the placebo effect, which can be responsible for up to 30 per cent of the benefit of a treatment. It may be diminished or magnified depending on the attitude of the patient and/or their medical attendants; blinding equalises the placebo effect in the treatment and control groups. Similarly, the side effects of the drug must be assessed. Without blinding, the patients on the drug may overcall or downplay their symptoms. While randomisation is a prerequisite, you can perform a randomised clinical trial without blinding, but there will always be a nagging doubt about the results.

Enrolling patients in randomised clinical trials is difficult. Many doctors are less than interested in clinical research, and only a small number of institutions have the necessary resources to conduct one properly. Even the most sophisticated centres can struggle to enrol patients in studies, and it often takes several years to accrue adequate numbers.

The results of a properly designed and conducted randomised clinical trial apply to the people who participated in the trial, but do they apply in the real world? This depends on what is known as the generalisability, or external validity, of the study. Many studies exclude children, women of childbearing age and the elderly. Ethnic minorities are often underrepresented in large studies undertaken by several teams in different places. Further, randomised clinical trials are attractive to patients with better “health-seeking behaviour,” and these people are known to have better health outcomes than those without that approach.

The problem of external validity is sometimes cited as a reason for ignoring the findings of a randomised clinical trial. Again, it works both ways – some doctors ignore the positive findings of a study, saying it wouldn’t apply to their particular patient population; others ignore negative findings, saying their kind of patient wasn’t included.

So, with all these caveats in mind, let’s assume that we have properly randomised subjects in the olololol and control groups and that we have blinded the allocation from both the patients and the observers. We can now measure the effect of the drug by counting the number of people in each group who had a heart attack by the end of the year following their initial attack.

At this point we have to perform a statistical analysis of the data to reassure ourselves that the differences, if we find any, are unlikely to have occurred by chance. Many people go weak at the knees when the s-word is uttered. Indeed, you may be a little dismayed to discover how useless most doctors are at statistics. It is quite easy to intimidate even the grey-haired ones with a few t-tests or chi squares or a multiple logistic regression model. Many will be satisfied just with the knowledge that there is a statistically significant result. In reality, though, the statistics are one of the easier parts of a study to get right. More trials are invalidated because of fundamental design problems, such as flawed randomisation or inadequate blinding, than because the wrong statistical tests were employed.

Let’s say the results show that the proportion of participants who had a subsequent heart attack was 10 per cent in the control group but only 5 per cent in the olololol group, and that the difference is statistically significant. (Note: this is a much better result than would be expected in real life.) One way to express the difference is to calculate what is known as the relative risk reduction, or RRR. This is the difference in rate of heart attack between the two groups, divided by the rate in the control group: 10 per cent minus 5 per cent, divided by 10 per cent equals 50 per cent. In other words, the risk of having a heart attack in the participants who took olololol is half that of the patients who did not.

A reduction in the risk of death by 50 per cent sounds very impressive, but you have to be very clear what it actually means. Whenever you are faced with a relative risk reduction, you have to ask a simple question: a reduction compared to what? To work out what the “what” is, you have to calculate the absolute reduction in the risk of a heart attack. This is easy, because it is just subtraction: 10 per cent minus 5 per cent. In other words, the absolute reduction in the risk of a second heart attack is 5 per cent, a much smaller number than the 50 per cent relative risk reduction.

The relative risk reduction is always higher than the absolute risk reduction, so it is the best way to make your results sound good. It can also mislead patients (and their doctors). Many think it means they have a 50 per cent chance of dying if they don’t take olololol, even though the real risk of dying if you don’t receive olololol is 10 per cent.

The advances in the treatment of heart attacks in the past two decades have improved the mortality rate following a heart attack treated in hospital from about 80 per cent in 1980 to 3–5 per cent today. As a result, most contemporary researchers haven’t got as much mortality to play with.

Modern multicentre, international randomised clinical trials that look at the effect of a new drug or procedure on heart-related events will try to demonstrate a difference in mortality of around 1 per cent, so let’s make the olololol numbers a bit more realistic. If the death rate in the placebo group is 3 per cent and 1.5 per cent in the olololol group, this gives the same relative risk reduction of 50 per cent (3 per cent minus 1.5 per cent, divided by 3 per cent) but now the absolute risk reduction is only 1.5 per cent, which is about three times lower than the rate we looked at above.

The number crunching may or may not have left you a bit breathless at this point. The challenge is to simply and effectively convey the meaning of the numbers. Now that we have completed the crash course on randomised clinical trials, we can consider how the results of such trials are used by doctors in real life.

It is not an exaggeration to say that my medical career can be divided into the time before I studied EBM and the time after. But one part of the course in particular caught my attention. Learning this helped me turn my medical intuition into something more tangible. For it was here that I discovered my favourite number – the number that makes it obvious to my patients that they don’t have a 50 per cent chance of dying if they don’t get olololol; the number that helps me to determine the clinical significance, as opposed to the statistical significance, of the olololol results and those of any other randomised clinical trial. My favourite number gives a human dimension to an epidemiological raw figure. When I first saw how to calculate it, I literally said, “Wow!”

It is the number needed to treat.

The number needed to treat, or NNT, is the number of people who have to receive a new drug in order to cure one person, or the number of operations you have to perform to save one life. The NNT is beautiful because it is so easy to calculate: it is simply the inverse of the absolute risk reduction (one divided by the absolute risk reduction). For the example of olololol, where the absolute risk reduction was 5 per cent (5/100), the NNT is 20 (100/5); for the absolute risk reduction of 1.5 per cent (15/1000), the NNT is 67 (1000/15). So although the relative risk reduction in both olololol examples was the same at 50 per cent, the NNT gives me a clear indication of the benefit my intervention offers the patient sitting in front of me.

I am often asked what a “good” NNT is. The answer is: it depends. If the outcome you are trying to prevent is death, then an NNT of 67 could be acceptable, but you would also have to consider the cost and side effects of the drug in question. If the outcome was faster treatment of your groin rash, then 67 would not be acceptable and even 20 might not be enough if it was an expensive drug. A drug costing $10,000 a treatment may be acceptable if it saves a life. The cost per life saved can be calculated, in simplest terms, by multiplying the cost of the treatment by the NNT. In the olololol example of an NNT of 67, it would cost $670,000 per life saved, which is hardly a bargain.

You can predict what the NNT is likely to be if you know how common the disease or complication is that you are trying to prevent. If the intervention or drug works, the more common conditions will have a lower NNT than the rarer ones. For people with a low risk of heart disease, for example, 1667 would have to take an aspirin each day to prevent either one heart attack or one stroke. On the other hand, among people who are actually having a heart attack, only forty-two need to take an aspirin to get the benefit of one life saved within a month.

Statin drugs, which are used to lower cholesterol, are relatively cheap and their few serious side effects are uncommon. If you haven’t already had a heart attack, the NNT with a statin over a five-year period to prevent one heart attack is around 34. This means that just over 97 per cent (that is, one over 34 subtracted from 100 per cent) of the people who have high cholesterol but who are otherwise well will derive no benefit from the statin they take each morning. But if you multiply the small benefit that an individual derives by the size of the population at risk, then you find that there is a benefit of public health significance. It is this fact that has made it worthwhile for the Australian Pharmaceutical Benefits Scheme to subsidise the costs of statins to the tune of a billion dollars per year. If the outcome you want to prevent is more common, it is easier to show a benefit.

Remember, if a drug works, it must have side effects. Randomised clinical trials also provide an opportunity to calculate the number needed to harm, or NNH, which is the number of people you need to treat to harm one of them. The NNH is calculated in the same way as the NNT – it is the inverse of the absolute difference in the harm seen between the treatment and control groups. The balance between NNT and NNH is complex – the significance of an isolated NNH depends on the severity of the side effect in question and the condition you are treating. The side effect of death would score highly in any calculation of harm; severe nausea and vomiting, on the other hand, would be an acceptable side effect if the drug was given to reduce the risk of death but less so if it was for treating tinea.

The NNT allows me to visualise the likely benefit my patient is going to derive from treatment, and I can balance that against its inconvenience, cost and risk of side effects. As you can see, there is an enormous amount of work to be done before you can calculate it and, when all is said and done, it is only a number. But the next time you visit a healthcare provider and they want to operate, manipulate or medicate you, keep the NNT in the back of your mind.

It isn’t enough for your doctor, pharmacist or physiotherapist to use unqualified words such as “evidence” or “statistically significant” or “clinical trial” to justify their actions. They may pay lip service to EBM but, as we have seen, achieving the highest level of evidence is extremely hard. Only a rigorously conducted randomised clinical trial provides results we can rely on and, even then, the studies have to be reproducible in other settings.

But at least if you can extract the absolute risk reduction from your doctor, you can calculate the NNT and you will be able to quickly assess the magnitude of the potential benefit for you as an individual. When you ask the question, some doctors will look blankly at you, a few will be annoyed but most, especially the younger ones, will smile and take the conversation to a new level. And that strikes me as a pretty good reason to brush up on something as simple as subtraction and division before your next appointment. •