Author: Bevan Roodenburg
Peer reviewers: Aidan Burrell, Chris Nickson
Nuts and Bolts of EBM Part 2
In Part 1, Chris Nickson’s Nuts (and bolts of EBM) touched on the PICO question. When we have a clinical question, we look to see if it has been answered, focusing on specific patients, interventions, comparisons and outcomes.
This time round we focus on analysing randomised controlled trials (RCTs).
Q1. What’s an RCT? Try using PICO in your explanation.
A randomised controlled trial is:
- A study, where a sample of similar people (population/patients) are randomly allocated to 2 (or more) groups
- Each group subsequently receives or are exposed to different interventions, to which they are randomly allocated. One group is always the comparison group (control), which might be a placebo, no intervention, or “standard” care. The other group(s) receives the tested intervention(s).
- The groups are followed up, measuring pre-specified outcomes. Differences in the outcomes between groups are assessed statistically.
Q2. “All that glitters is not….” Fill in the blanks with the most ridiculous or most accurate:
“Randomized, controlled trials, when ______________ ____________ , ___________ and __________ , represent the _________ _________ in evaluating health care interventions”.
This is the ‘gold standard’ answer:
“Randomized, controlled trials, when appropriately designed, conducted and reported represent the gold standard in evaluating health care interventions”.
— CONSORT 2010 Statement
The flip side of the gold standard is “fools gold!” The gold standard can quickly be tarnished if studies lack methodological rigor, yielding biased results. RCTs are only the gold standard when they are performed correctly, otherwise they may lack internal validity.
As readers, we need complete, clear and transparent explanations of methods and findings. We must understand how to analyse an RCT, before we can interpret answers to our clinical questions.
Q3. What is the “null hypothesis”?
a. Name of a nerdy cover band full of biochemists. (TRUE or FALSE?)
b. The assumption that “there will be no difference, between groups, in the outcome.” (TRUE or FALSE?)
The correct answers are:
a. TRUE. They have a facebook page and youtube clips.
b. TRUE. If we can prove this wrong, we have the alternative to the null, “there is a difference”.
There MUST be a clear null and alternative hypothesis in a randomised controlled trial. This is often likened to the presumption of innocence in a criminal trial. We are all innocent (the null hypothesis) until PROVEN guilty.
How the outcome is measured, what sort of data is used (eg continuous, discrete, etc), including the distribution of data will determine what sort of statistical tests are used (beyond the scope of this ‘Nuts and Bolts’).
Q4. Juries must believe BEYOND REASONABLE DOUBT that someone is guilty (the alternative to the null), or they are declared not guilty (the null). What is the measure that we use to “prove” the alternative hypothesis?
Most commonly (for better or worse), we use the p-value.
The p-value is a widely used statistical measure of “reasonable doubt”. We set the cutoff that we are willing to accept. Often set at <5% or <0.05 in clinical trials, it is the possibility that the observed result is false. If we don’t meet the cutoff, we don’t reject the null.
(Editors note: Confidence intervals are better! Look out for a discussion of significance in a future ‘Nuts and Bolts’)
Q5. In order to test a theory that caffeine consumption causes ICU doctors to be happier during their 13 hour shifts, I want all junior registrars (JRs) to STOP drinking coffee and all senior registrars (SRs) to drink at least 2 coffees per day. I will then measure their happiness at midday and 6pm.
Is this an RCT?
This is a prospective trial. Clearly there is a control and intervention group, with an outcome. However the groups are not random. There may be inherent differences in the two groups. This will introduce bias.
The key purpose of randomisation is to eliminate bias. To ensure that baseline characteristics and differences in groups (including unknown confounders) are there via chance alone.
Computer-generated random numbers are usually used to determine the group to which an individual is assigned. There are different ways of performing randomisation, such as:
- simple – no restriction on allocation (groups may be unequally sized)
- block – allocation is performed in blocks, so that groups are equally sized within each block.
- stratified – factors such as age, sex… are randomised separately, so that they are equally distributed amongst the groups.
Q6. How do we know, when reading a paper, that randomization was a success?
Look at “Table 1”
Were the groups similar at baseline, before the intervention? Papers traditionally report this data as “Table 1” (baseline characteristics). If the randomisation process achieved comparable groups, the groups should be similar. The more similar the groups the better the ‘baseline balance’.
There may be some indication of whether differences between groups are statistically significant (ie. p values), just remember that if you perform enough statistical tests something will end up being statistically significant due to chance alone. This why the CONSORT statement advises that statistical tests on baseline characteristics should not be performed in RCTs.
Q7. I am considering whether or not to enrol a patient in POLAR, a study comparing therapeutic hypothermia with standard temperature management in TBI. They fit the inclusion criteria. It’s late at night. The patient looks like a lot of hard work because of the severity of their TBI and their BMI. I know the nurse who will look after the patient hates using the cooling machine. To make decision easy, I check to see what the next allocation envelope says. Relief, it says, “CONTROL”. So I enrol the patient, knowing that it won’t make everyone’s life more difficult.
What is the missing principle here?
Lack of allocation concealment is a MAJOR issue effecting the validity of a study. It introduces systematic errors in sampling/allocation before patients are entered, and cannot be corrected for.
The CONSORT statement defines allocation concealment as:
“a technique used to prevent selection bias by concealing the allocation sequence from those assigning participants to intervention groups, until the moment of assignment”.
Q8. Back to the coffees… Each day I give the SR group two single origin coffees purchased (at great expense) from the fancy coffee shop down the road, and the JR group — the “control” group — get decaf instant coffees from the tea room… I make them myself. I also measure everyone’s happiness myself.
How could I do this better?
Blinding… as much as possible, as many people as possible.
Blinding is the process where people involved in a study are kept from knowing which treatment group a person is in. It prevents bias. For example, clinicians may treat patients differently if they know a patient has had placebo, patients may report their experiences differently if they know they have had treatment, or observers may report outcomes differently if they know the patient’s treatment.
Blinding is sometimes referred to as single (patient), double (patient and treating clinicians), triple blinding (patient, clinicians and observers). CONSORT 2010 discourages these terms, and prefers studies to report who was blinded after assignment to interventions (for example, participants, care providers, those assessing outcomes) and how.
Q9. If I asked for volunteers for a study into the effects on happiness of ceasing all coffee drinking. Then randomized you into either no coffee or heaps of coffee. How could these inclusion criteria introduce bias?
Volunteer bias (a type of selection bias)
Realistically, if the groups are randomized correctly, they will be even at baseline. However having volunteers for a study may self-select specific types of people into the study, thus limiting the generalizability of the results. I may miss out on people who actually need to cut back on coffee, as they won’t volunteer…
Selection bias occurs whenever the population sample included in the trial is not representative of the population that the trial seeks to study.
The CONSORT flow diagram is the fasted way to visualize who was included or excluded from a study.
Inclusion and exclusion criteria are important to understand, especially when looking at the external validity. Too narrow a sample group and studies are at risk of difficulty finding cases to enrol and may the results not be generalisable to wider clinical practice. Too wide, and the study risks excessive heterogeneity and negative outcomes, or limited applicability to more specific situations.
Q10. The study presented today in journal club included “per protocol” analysis, not intention to treat analysis. What is the difference?
Per protocol analysis uses results only for cases who adhered to the protocolled intervention (per protocol) as compared to what they were intended to receive a treatment (intention to treat).
Another system of analysis is “per treatment” analysis, using results based on what treatment received, rather than were randomized to. Many cases may be excluded from analysis after they were initially randomized when ‘per protocol’ or ‘per treatment’ analysis is performed.
All systems have pros and cons. Intention to treat is often more appropriate, but where there are many protocol violations, late exclusions or loss to follow up, results may be less reliable. However, in such cases it is important to think about the reasons for large numbers violations and exclusions when assessing the internal and external validity of the trial!
Q11. What are the determinants of required sample size?
We decide how likely a false positive or false negative result should be accepted. The required sample size will be larger, if we want less chance of error.
It is determined by:
- α: the lower the acceptable chance of a false positive, the larger the sample size required. This is typically set at 0.05 (same as the p-value or 95% confidence interval)
- Power (1-β): the higher the acceptable likelihood of finding a difference (if there really is one, the larger the sample required. In clinical trials a power of 80% is widely used.
- Effect size Δ: smaller effect expected/accepted, the larger sample required. This depends on what the study investigators consider clinically important and may be based on previous studies.
- Variance in the population: higher the variance, the larger sample size needed.
The sample size required must be determined before starting a study. It is ethically unsound to spend too much money/resources on a study larger than required (not mention exposing patients to potentially harmful or non-beneficial therapies), and it is a waste of time, money, etc if we neglected this and underpowered the study so missed out on getting to an answer.
This comes up in CICM fellowship exams frequently enough!
Q12. Let’s say, hypothetically, I decide to run a study into the effects of thrombolysis in stroke. After randomization, all stroke victims with standard care are admitted to a medical ward. All thrombolysed patients are initially admitted to ICU for 24 hrs observation due to the workload required to assess for complications.
Can this study conclusively report the results of the intervention?
Yes, they can, but they shouldn’t!
Clearly, the treatment of the groups needs to be exactly the same EXCEPT for the intervention, or the study is not really focused on the intervention, but a series of different interventions. The literature is littered with these sorts of inaccuracies.
Q13. What is the primary outcome and how does it differ to the secondary outcomes?
The primary outcome is the specific outcome that the study is designed to assess. The study is powered to test the primary outcome. Secondary outcomes are not specifically included in the statistical planning of the study.
Secondary outcomes are often of interest, but sometimes they are all people remember about a study. The results of secondary outcomes should be interpreted with caution. If one looks for enough secondary outcomes, then by chance, a difference will be found eventually. The same caution is needed when interpreting sub-group analyses, where primary or secondary endpoints are reported in smaller sub-groups of the whole study population. Such results should be considered hypothesis generating rather than practice changing.
If used, to be considered useful, secondary outcomes and sub-group analyses should:
- be planned before randomization
- be biologically plausible
- have a large(r) treatment effect
Q14. What are the pros and cons of RCTs?
- ONLY study to establish causation
- Precise, controlled treatments
- Precise measurements/data
- Blinding easier
- Decreases pt/observer bias
- Increased similar baseline features between groups
- Can plan trial size, larger trials can detect small differences in outcome
- Subgroup analysis can enhance usefulness for clinical practice
- More likely to get published
- Logistics –sites/locations/teams etc
- Not “real life”
- Practice misalignment
There are many unanswered questions where RCTs are impossible, either through practicalities, ethical limitations, community expectations etc etc.
Much of the greatest work in public health is based on observational data, rather than RCTs. Arguably, these have had the greatest impact on human health across the globe. Imagine trying to randomize people to 3 decades of healthy versus unhealthy lifestyles!
Key practical tips for analyzing RCTs
Know the study question (PICO) and use critical appraisal checklists like these ones:
References and links
- CONSORT Statement Website [accessed 6 June 2016] (the CONSORT statement is a must read!)
- Nickson CP. Blinding and allocation concealment. Lifeinthefastlane.com. 2016 [cited 6 June 2016]. Available from: http://lifeinthefastlane.com/ccc/blinding-and-allocation-concealment/
- Nickson CP. How to analyse a clinical trial. Lifeinthefastlane.com. 2016 [cited 6 June 2016]. Available from: http://lifeinthefastlane.com/ccc/how-to-analyse-a-clinical-trial/
- Nickson CP.How to conduct a clinical trial. Lifeinthefastlane.com. 2016 [cited 6 June 2016]. Available from: http://lifeinthefastlane.com/ccc/how-to-conduct-a-clinical-trial/
- Nickson CP. Intention to treat analysis. Lifeinthefastlane.com. 2016 [cited 6 June 2016]. Available from: http://lifeinthefastlane.com/ccc/intention-to-treat-analysis/
- Nickson CP. Power and sample size calculation. Lifeinthefastlane.com. 2016 [cited 6 June 2016]. Available from: http://lifeinthefastlane.com/ccc/power-and-sample-size-calculation/
- Nickson CP. Randomised Controlled Trials (RCT). Lifeinthefastlane.com. 2016 [cited 6 June 2016]. Available from: http://lifeinthefastlane.com/ccc/randomised-control-trials/
- Nickson CP. Randomisation. Lifeinthefastlane.com. 2016 [cited 6 June 2016]. Available from: http://lifeinthefastlane.com/ccc/randomisation/
- Oxford University Centre for Evidence-Based Medicine 5 steps of EBM with useful, practical checklists and explanations http://www.cebm.net/category/ebm-resources/tools/
You forgot the most important benefit of an RCT- to balance the two groups with regards to known AND unknown confounders.
Thanks for the comment
Mentioning ‘unknown confounders’ is important I think – hence I’ve amended the answer to Q5 to include:
(including unknown confounders)
Vive la post-publication peer review!