Drug control is one of the more messy fields of integration of science and policy, and is certainly up there with climate change and the mechanics of Boris Johnson’s hair. The post from yesterday demonstrated how complex the science-policy interface can be, with respect to David Nutt’s dismissal from the Advisory Council on the Misuse of Drugs after pushing for his evidence to be used to guide policy reform in drugs control. Throughout, I made the assumption that his science was rigorous enough, not to avoid challenge, but to at least be of value to policy and decision-making processes. My current commute to and from Leicester is mind-numbingly boring. To offset this, on the way home I read his infamous co-authored paper from 2009, published in The Lancet for an analysis conducted into the relative harms of drugs in the UK. What I read about was a series of poorly-conducted analyses, and statements that didn’t seem to fit their results or were vague, meaningless and unsupported.
Now, I may be being over-critical, a tendency with which I’m inflicted fairly often, but frankly it was just bad science. I would be far more supportive of Nutt et al.’s claims if they were ground in firm science, but they didn’t seem to be. The value of scientific evidence is huge. When used properly, it can provide a strong basis for policy decisions (see Mark Henderson’s recent book, The Geek Manifesto, amongst others for insight into this). But what if the science is flawed, or just simply, wrong? Discourse dictates that science must be open to refutation, modification, and extension. So what happens when science is used to inform policy decision, but is later over-turned? What if, Nutt’s research had been taken at face value to dictate drug control policy reforms, but then later revealed to be insubstantial in the face of new or better evidence? The following is an attempt to critique the methods and conclusions of his paper. Disclaimer: I am a Palaeontologist, so most of the following could be complete BS.
My first issue was with the assumption that the group of scientific experts he had gathered to conduct the analysis were impartial, independent, unbiased in their approach to the study. We would always like to think that this is the case, but with such an analysis is it even possible to remain politically and ethically detached?
The second issue was with the methods used to score various evaluation criteria for each drug. “The group scored each drug on each harm criterion in an open discussion and then assessed the relative importance of the criteria within each cluster and across clusters”. Scientists, of course, are known for their tendencies to agree completely with other scientists in their field. There is no indication given of what the actual score produced was – a mean or majority value? Either way, why did they not include the range of scores, which would be a much more informative statistic? Furthermore, “scores often changed from those originally suggested as participants shared their different experiences and revise their views.” While open discussion is obviously a good thing, there is no indication of how, why or the extent to which scores were changed, or for which criteria. Was the basis for changes ethical or scientific, or arbitrary? Either way, this method appears to ignore the implications and potential impact that any changes may have, and is actually an entirely closed process.
The third issue was with the weighting method used to rank criteria against each other. The example given is that drug-related mortality was judged to be the highest impact of criteria within the ‘Physical harm’ cluster (for obvious reasons), and that all other criteria are judged as a proportion against this. The next in rank for this category was drug-specific mortality, “which was 80% as great as drug-related mortality”, and weighted accordingly. Wait, what? Did they just seriously compare those two, and then against specific- and related-damage? I would love to see them justify their reasoning for that to the families of people who have died from drug-related activities. It’s an arbitrary decision system, and even if there is a data basis, I’m not sure it’s morally right to rank them in such a way. For their other categories, I’m clueless as to how they relatively weighted some criteria (see Figure 1). How can they possibly say Crime should be weighted as xx of Environmental Damage, in terms of social harm? Confusedasaurus rex.
Figure 2 shows their overall ranking based on their calculated harm scores. Above they mention that drug-related and –specific harm are the worst criteria (I’m pretty sure most people would agree that death is the worst result of drug abuse). They then identify 5 of their analysed drugs as the highest with drug-specific mortality rates. In Figure 2, these 5 (alcohol, heroin, GHB, methadone, and butane) do not occupy the top 5 positions (alcohol and heroin are both top, however), which means according to the scoring system used, a combination of other harm factors is enough to outweigh the mortality rates. As I’ve said above, I hope they would be able to justify this to drug-related victims. Without even thinking about it, I would state that the drugs that caused the most deaths would be those ranked as the most harmful. I don’t see any other morally justifiable way of ranking them with this criterion in mind.
Fifth issue (you should probably go and make coffee) is with how Nutt et al. place Class thresholds on their data. See Figure 2 and 3 – where would you draw the boundaries between different Classes of drug on there, and what would be your justification? Ethics, neurological effect, social impact? This isn’t necessarily a problem of Nutt’s analysis, but highlights the difficulty of placing evidence-based quantitative thresholds on data of this nature, assuming that any gaps that your eyes are trained to identify aren’t just artefacts of low-sampling of continuous data. The same can be said for almost any instance in which Class thresholds have to be established. There are methods to statistically identify thresholds, but it’s a bit beyond this, er, discussion, rant?
Correlation is a pretty simple concept to understand. When comparing data sets, you can calculate a correlation coefficient (Pearson’s is the one used here). If you get a value close to 1, that means the data sets are strongly correlated, and if you get close to zero, it means there’s no correlation (this is massively over-simplified). Nutt et al. compare their results with several others, and get correlation values of 0.7, 0.8, and 0.84 for total data sets, a “less-than-perfect correlation”. The reasoning behind these values is over-simplistic, stating that information quantity from different scoring processes. In fact, the difference could simply be that the data sets don’t agree! Furthermore, other parameters such as different players with different ethics and scientific judgements were ignored. To dismiss these low correlation values without further consideration to their implications is pretty poor. A correlation value with another data set of 0.66, relating specifically to drug-specific mortality, “provides some evidence of validity”. Nutt et al. are clearly ‘cup half-full’ researchers then.
There are several data issues, not least of which is that data simply isn’t available for many of the criteria analysed. Excusing this by stating “the expert group is the best judgement we can provide” is a cheap way out of this – if you need more data, go and collect it! This would highly reduce the questionability of their attributed scores. In fact, if the data doesn’t exist, then what was the basis for many of the scores? Their evidence-based arguments seem to be lacking in actual evidence, of the well-supported kind. Furthermore, they state “although the assessed weights can be made public, they cannot be cross-validated with objective data”. I can’t actually find their data anyway, but isn’t this side-stepping that one of the defining aspects of science is repeatability? How can anyone legitimately analyse the data to confirm or reject the conclusions drawn here, if it is not available or entirely subjective? The authors did run “extensive sensitivity analyses” on their data, showing that their model is stable, but as one of my old supervisors used to say, “crap in, crap out”. The large changes needed to modify the ranking system don’t really mean anything, as the meaning of the relative scores is unknown, as is the process of their calculation.
Finally, I just want to point out that the authors state a “low score in our assessment does not mean the drug is not harmful”, to their credit. But they kind of undo this with a statement above: “Some drugs such as alcohol and tobacco have commercial benefits to society in terms of providing work and tax, which to some extent offset the harms”. So because they form a huge part of our economy, the deaths they cause (as cited in their paper) are “to some extent” justified, or offset. Now that, that is a moral standard I hope we can all achieve one day…