In the process of catching up on stuff from the World Bank, I came across another ill-conceived paper, published in January: The challenge of measuring hunger. Different methods of measurement, the authors complain, yield radically different results: “In our survey experiment, we calculate hunger to range between 19 and 68 percent”. Perhaps they might have worked out from this that there is something fundamentally wrong with the approach they’re taking. It’s more than fifty years since the indicators movement first argued that we had to stop thinking about social issues in terms of single, accurate, precise measures. What we need are indicators, pointers or signposts – multiple sources of evidence where we look for direction, reinforcement and corroboration, rather than authoritative answers in tablets of stone. Anything else is doomed to failure.
I’ve referred earlier this week to the work of Ioannidis, who argues that most published medical statistics are wrong. The British Medical Journal regularly uses its Xmas issue to publish some disconcerting, off-beat papers. In a previous issue, they produced the findings of a randomised control trial which showed an apparently impossible result: praying for people whose outcomes were already decided several years ago seemed to work. The message: don’t trust randomised control trials, because they’re randomised. This year, an article, “Like a virgin”, identifies 45 women in a sample of nearly 5,500 who claim to have had a virgin birth. The message: don’t believe everything people tell you in surveys. If only medical journals applied the same rigour to some of their ‘serious’ results.
The reports of another supposed breakthrough in genetic research are, like so many before it, rather exaggerated. Last week, a New Scientist editorial commented that neuroscience
” is plagued by false positives and other problems. … Scientists are under immense pressure to make discoveries, so negative findings often go unreported, experiments are rarely replicated and data is often “tortured until it confesses”. … Genetics went through a similar “crisis” about a decade ago and has since matured into one of the most reliable sciences of all. “
Yesterday the newspapers were stuffed with reports from that most reliable and mature of sciences, concerning the discovery of 11 genes newly implicated in the causation of Alzheimers. This is from the Independent:
The role of the immune system in defending the brain against Alzheimer’s disease has been revealed in a study identifying 11 new genes that could help to trigger the most common form of senile dementia.
There’s more than enough there to be able to tell that the report is confused. In the first place, Alzheimer’s disease is not a single disease entity; it’s a syndrome. The term is used as a residual category for any form of dementia where there isn’t as yet a clear understanding of the process. Over the years, the size of that residuum has gradually been reduced as various specific disease entities have been identified – Pick’s, Huntington’s, Parkinsonian dementia, Lewy body, CJD and so on. The process of refinement still has a long way to go. Second, there is no evidence that Alzheimer’s is genetically determined or ‘triggered’ by particular genes. The study does not actually claim to show that the immune system defends against Alzheimer’s. All it does it to identify a group of SNPs or snips (single nucleotide polymorphisms to their friends) associated with the immune system which show some association with the diagnosis of dementia. That’s an interesting finding, because it suggests that it may be worthwhile to examine immune systems to see what connections emerge. It’s not the same thing as showing that genes cause Alzheimer’s.
However, it’s not possible to exonerate the authors of the paper altogether of blame for the misrepresentation. The title of the article, published in Nature Genetics, is: “Meta-analysis of 74,046 individuals identifies 11 new susceptibility loci for Alzheimer’s disease”. This does assume that the associations show ‘susceptibility loci’, and it emphasises that it’s a big study, which implies that it has greater authority as a result. The conclusion suggests that what needs investigating is the potential association with the risk of Alzheimer’s.
There are three common errors here: the paper commits some of the cardinal sins of statistics.
- Confusing association with causation. An association doesn’t in itself tell us what the influence of genes is or what the direction of causation is. It follows that assocation with certain genes doesn’t reveal susceptibility.
- Confusing significance with risk factors. A relationship can be highly statistically significant although its effects are very limited. (On a graph, it’s the slope of the regression line that really matters rather than the closeness of fit of the observations). It’s possible that some small part of the response is attributable to the associated factor, and in medical terms that’s potentially important – it could relate to a particular condition – but that’s not equivalent to a risk factor, and in any case the work done doesn’t identify that.
- Fishing, or data mining. In any very large body of data, there will be some unusual associations – it’s in the nature of the exercise. It doesn’t follow that those associations can be invested with meaning. This study fishes for the data in a massive pool – over 17,000 people with Alzheimer’s, over 37,000 controls and more than 7 million SNPs. Then in stage 2 there were 8572 people with dementia, 11,312 controls and 11,632 SNPs. The significance levels were strict (p < 5 per 10*-8), but the sheer size of the data sample makes the statistics more problematic, not less so. The method can’t do more than suggest that some patterns merit further investigation.
An editorial in Friday’s Scotsman complains:
“People are classified as being poor if their income is less than 60 per cent of the UK median. Given this is a relative, as opposed to absolute, measure, then we can say with mathematical certainty that the poor will always be with us.”
I gave some examples of similar muddles in a paper I wrote last year (Why refer to poverty as a proportion of median income?, Journal of Poverty and Social Justice, Volume 20, Number 2, June 2012 , pp. 163-175.) The researchers who introduced the measure explained that the test “does not mean that there will always be poverty when there is inequality: only if the inequality implies an economic distance beyond the critical level.” However, people don’t understand averages or distributions – and journalists usually get where they are by studying words rather than numbers.
There are problems with the use of 60% of the median, but the supposition that it invents poverty isn’t one of them. The main problems are that it compares poor people with incomes that are not much better, that it assumes it’s always impossible for more than half the population to be poor, and that it’s not well understood. The main defence is that it works, more or less, for Europe and for the OECD countries. 60% of the median is primarily a test of very low income, and in countries where income distributions are more equal, poverty is much lower.
A new version of the Benefit Expenditure Tables has been released, including information about Child Benefit and Tax Credits, which for the last few years have been treated as a matter for HMRC. The nominal cost of all benefit expenditure for 2012/13 was £201.9 billion, of which £107.7 billion (53%) goes on pensioners.
There has been a couple of weeks delay while the presentation of figures were being rejigged after the Budget, but it doesn’t look as if they’ve been done too carefully. For example, the series for Child Benefit in table 1 stops in 2002/03 and hasn’t been resumed despite the inclusion of all the data needed for it in a different table. And there is no obvious reason why the nominal outturns on costs for older people in the “Summary Table: GB Benefits and Tax Credits” should be lower than the cost for DWP benefits alone in the “Benefit Summary Table”.
It has been announced that one in ten people referred to the Work Programme, 73,260 up to last April, have been subject to sanctions for failing to avail themselves of the opportunities. Or it has not been announced, depending on your point of view, despite the very specific figures and the ministerial comment: the Telegraph explains that this is what “the Department for Work and Pensions (DWP) is expected to confirm next week when it publishes the first official statistics on the overall success of the programme.” If this is an official announcement, it would be another clear breach of the UK Statistics Authority’s Code of Practice.
We know what the Minister Mark Hoban thinks of the figures; he thinks it shows that people are scrounging. “Sadly some people are clearly very determined to avoid having to get job at all.” There are other possibile explanations. It might be, for example, that people think they are better able to find work if they’re not on the programme. It might be that the tens of thousands of people who have been forced to claim JSA instead of incapacity benefits are too sick to work, and now they are being cut off benefits altogether. It might be that people are being sanctioned for not replying to letters. It might be that some have found work – because, despite the propaganda, that’s what most unemployed people do. It might be that people who are being cut off from benefit are being forced into crime or prostitution instead – it’s happened before. We just don’t know, which is why we need the detailed evidence and statistics.
I have written today to the UK Statistics Authority to raise some questions about the government’s figures on “troubled families”. In December the Prime Minister explained:
Today, I want to talk about troubled families. Let me be clear what I mean by this phrase. Officialdom might call them ‘families with multiple disadvantages’. Some in the press might call them ‘neighbours from hell’. … We’ve always known that these families cost an extraordinary amount of money, but now we’ve come up the actual figures. Last year the state spent an estimated £9 billion on just 120,000 families – that is around £75,000 per family.
The UK Statistics Authority exists to guarantee the integrity of official statistics in the UK. They have established a range of criteria for integrity, transparency and quality, but among other requirements they state that departments should
- “Ensure that official statistics are produced according to scientific principles”
- “Publish details of the methods adopted, including explanations of why particular choices were made.”
- “Issue statistical reports separately from any other statement or comment about the figures and ensure that no statement or comment – based on prior knowledge – is issued to the press or published ahead of the publication of the statistics.”
That is not what’s happened here. “We’ve come up with the actual figures”, the PM’s statement says, and policy has been rolled out from that starting point. Some explanation of where the figure of 120,000 families come from appeared in a note from the Department of Education, though it was not publicized; there have been trenchant criticisms from Jonathan Portes and Ruth Levitas, on the basis that there is no connection between the indicators used to identify troubled families and the problems of crime and anti-social behaviour. The basis of the costings is still not publicly available. I’ve asked the Statistics Authority to consider whether there has been a breach of their Code of Practice.
In February, I wrote to the UK Statistics Authority to express concern about some uncheckable claims being made about the benefits of work experience. The Minister for Employment, Chris Grayling MP, had published an open letter to Polly Toynbee on Politics Home, claiming that “a significant number of placements turn into jobs, with the employer getting to like the young person and keeping them on. … so far around half those doing placements have come off benefits very quickly afterwards.” In the Times on 24th February, he also claimed that “half those young people stop claiming benefits after taking part.” (p.32) This was referred to in BBC’s Question Time on 23rd February as evidence that the scheme was working well. The only evidence, however, was based on a first cohort of 1300 people on placement from January 2011 to March 2011, when by the time of the statement the scheme had been extended to more than 34,000 people.
The DWP has now published more data, this time covering 3490 people in the scheme from January to May 2011. It shows an increase in employment, by comparison with a group of non-participants, from 27% to 35%. There are two main reservations to make about the figure: that it still relates only to an early cohort, who may (or may not) have been easier to place than later cohorts, and that there is no explanation of what being “in employment” might mean in terms of hours or duration (the only test seems to be that the employer has sent a return to HMRC). It is also a lot less than the 50% originally claimed.
The press reports, again, that patients are being denied life-enhancing drugs to save money. In this case, the issue centres partly from the draft guidance prepared by NICE on Abiraterone, and partly on the impression in Scotland that the drug in question may be partly responsible for the unexpectedly long survival of a convicted murderer.
NICE gets a terrible press, but the work they do is exemplary. The consideration given by the committee is, as ever, consistently careful, thorough and balanced. Their brief was to review
- Overall survival
- Progression-free survival
- Response rate
- Prostate specific antigen (PSA) response
- Adverse effects of treatment, and
- Health-related quality of life.
There is a case for Abiteraterone. It does extend survival by about four months – roughly a third more than without the drug – and it seems to have fewer side effects than the existing drugs. However, the benefits are still limited, and the drug is hugely expensive.
This specific example seems to fall into a category discussed in a debate in the British Medical Journal in 2009 (31st January). Adrian Towse, the director of the Office of Health Economics, argued that the public were generally willing to support payments that were double what NICE was allowing for. The NICE thresholds were typically a cost of £20-30,000 for each QALY (a year of valued life), a figure that has been raised for end of life treatments; the public would support £30-70,000. Against that, James Raftery argued that the thresholds should be lower, because they force health trusts to take resources away from other, more effective treatments. The cost of Abiraterone falls in the region of £53,800 to £63,200 for each QALY.
There is beyond that a common problem: the evidence in this case is almost entirely supplied by the drug’s manufacturer. Manufacturers have only a limited window during which they can market a drug before patents expire; spending time to run all the tests, and in particular to identify the groups best able to benefit, is not always consistent with their financial interests. It is not clear whether Abiraterone does extend survival more than all the alternatives, because the manufacturer has not yet made all the necessary comparisons. If the gaps could be closed, the case for approving the drug would be stronger.
This is drawn from arguments posted on the Radical Statistics mailing list.
Genes are not a blueprint for the way we live. Biologists distinguish between genotype – the underlying pattern – and phenotype, the observable outcomes stemming from the interaction of genes, environment and the combined process of development. The argument has been made that environmental factors can make genes more important. For example, myopia, a condition rooted in genetic makeup, has been exacerbated by the development of reading. Variation in height, which is clearly governed by genotype, is nevertheless largely produced by environmental factors (which is why height has increased in succeeding modern generations). To illustrate the point, we know that two centuries ago, even if they were drawn from the same genetic pool, people were much smaller and lighter than we are now. One French study records that 79% of male recruits in 1792-9 were below 1.5 metres tall. The difference between that range and the range of heights in contemporary society is large enough to move people with a similar genetic endowment from a relatively low position to a relatively high one, depending on the developmental environment (primarily, in the case of height, on nutrition). A similar comment can be made about obesity. Estimates for the hereditability of obesity vary between 40% and 70%; but anyone who imagines that recent increases in obesity are due to changes in genetics isn’t living in the real world.
Despite nearly 150 years of trying, no-one has produced any good evidence that genes affect developed social behaviour in humans. With about 42,000 genes, it is easy to find statistical associations – at the conventional level where p<.05, there will probably be 2100 genes associated with any given character trait – but that does not demonstrate any causal link. Beyond that, however, most studies making claims about genetic origins of behaviour do not even try to show that there is a general association between the gene and the behaviour. They have simply relied on the occurrence of behaviour in specific families (1), and families have shared environments as well as shared genes. To the best of my knowledge, no study has ever shown that any social competence, personality trait or pattern of behaviour, of any kind, is shared by people with a common genotype or combination of genes while it is not present in others without that genotype. This is the minimum data that would be required to show that genes determine such issues.
Many studies rely, instead, on twin studies, in the belief that the similarity between identical twins must be genetic. This has three obvious problems. Firstly, any similarities within families may well reflect similar environmental factors. Second, identical twins generally have social environments which are very similar, and certainly more similar than fraternal twins. That’s why past studies tried to concentrate on identical twins reared apart – the problem being that (a) not enough twins are reared apart to make for a valid study, and (b) that even when twins are reared apart, social services agencies try to match their environments to the greatest possible extent. Third, identical twins are only relevant if one begins from the proposition that their genetic endowment is crucial. In other words, the studies assume the phenomenon they set out to prove.
The argument is not just bad science, It was used at the end of the 19th Century to justify the isolation of “degenerates” from the rest of the community. It was the basis for eugenics. It was closely associated with fascism, because it is an argument that was made by fascists for political reasons and offered in justification of the extermination of inferior humans. (2) The argument is sinister, and it deserves to be treated with deep scepticism.
Update, 24th November 2012. New Scientist reports this week about Mendelian randomisation, and that serves as a reminder to me that this criticism is beginning to be dated. The genetic linkage studies that were just being developed when I wrote this (e.g. Lancet, 2005 Sep 17-23;366(9490):1036-44) have started to bear fruit. A new epidemiology, described in Palmer et al’s Introduction to genetic epidemiology, has moved away from the old fallacy that behaviour is simply determined by genes; it begins, instead, with the proposition that different environments affect people with different genetic endowments differently. That makes it possible to distinguish the circumstances of people with certain genetic patterns from others – which is just what I was complaining here that studies hadn’t done to date.