A little more on why we can't trust the statistics in published articles

I’ve referred earlier this week to the work of Ioannidis, who argues that most published medical statistics are wrong. The British Medical Journal regularly uses its Xmas issue to publish some disconcerting, off-beat papers.  In a previous issue, they produced the findings of a randomised control trial which showed an apparently impossible result: praying for people whose outcomes were already decided several years ago seemed to work.  The message:  don’t trust randomised control trials, because they’re randomised.  This year, an article, “Like a virgin”, identifies 45 women in a sample of nearly 5,500 who claim to have had a virgin birth. The message: don’t believe everything people tell you in surveys.    If only medical journals applied the same rigour to some of their ‘serious’ results.

Two years on the blog

It’s two years since I decided to put the blog on WordPress.  Since then I’ve made about 320 entries, a little over three a week.  The blog gets about 1200 visits a month, which might sound good until you realise that my website on social policy gets more than  1200 visits a day.  My initial plan was to use the material to spark off new ideas for my written work, and I’m still hopeful it will help with that.  Looking up sources has been particularly useful in filling out my knowledge of the field.  I also owe particular thanks to those of you who’ve commented on blogs, because that’s one of the key ways that I learn  things I may have missed.

One of my teachers may have disapproved.   In Richard Crossman’s Diaries, he comments on the reaction of Brian Abel-Smith when he was invited to be editor of the New Statesman: “all that ephemeral journalism!”  Brian was hardly an academic purist: a distaste for ephemera didn’t keep him away from all that politics.   He did tell me once, in a good way, that I ought to be a politician.  I’d had loved it, but unfortunately, I suffer from three impediments: a contrary disposition, a tendency to put my foot in my mouth for the sake of a good line, and the absence of anyone who’d want to vote for me.  At least I can vent on the blog, where it does no harm.

More nonsense about our genetic destiny

Yet another paper seems to show that our educational attainment is written in our genes.   It claims that “individual differences in educational achievement are substantially due to genetic differences (heritability) and only modestly due to differences between schools and other environmental differences”.   It’s been widely reported as a claim that exam grades are down to nature, not nurture.

This is based on comparisons of the figures for identical and non-identical twins. The reason why people use twin studies is because they believe that our personal characteristics are determined genetically, and so that studies of twins will confirm this.  That is bad science.  You’re supposed to design research so that it can disprove the proposition under test, and twin studies can’t do it.  What a twin study could show, in principle, is that where monozygotic (genetically similar) twins are different, that difference cannot be genetic.  That is not, however, what any of them try to do.

Genes are not blueprints for later development; the genetic structure (genotype) has to interact with the environment (phenotype).  Your height depends on your genes, but it is not determined at birth; if you are starved you may be stunted.  (The increase in height in successive generations has to be largely environmental, because the gene pool changes only very slowly.)   If there is a genetic link between common genes and attainment, it does not necessarily mean that the level of attainment is determined by genes – it only predicts similar patterns of attainment within a given environment.    So it is not possible to show that any level of GCSE scores is down to genes – it’s all about whether people from the same family, with the same home background, with the same school (and often with the same teacher), with the same experience in early years and of the same age will achieve similar results.  Put that way, it would be surprising if the results weren’t very similar – the more so because the sample has been selected to exclude twins where one of them is disabled.

Heritability is supposed to examine the extent to which varability in the phenotype is attributable to the genotype.  As there is no direct exploration of genes or genetics in most of these studies, what they actually look at is the way that similar characteristics occur in families, and those are attributed to the underlying genetic structure.   There are lots of reasons besides genes why educational attainment might run in families – among them culture, lifestyle, language,  common experiences, and so forth.  The authors suppose that identical and non-identical twins all have similar home backgrounds, so that the differences between them must be down to the issue of whether they’re identical or not.  That however depends on the proposition  that identical twins are not treated more like each other than non-identical twins are, and that seems implausible.  For example, non-identical twins may have gender differences, and children of different genders are liable to be treated differently.

While the study attributes the differences in performance to DNA, DNA was not usually examined.  The results are supposed to be about the differences between monozygotic and dizygotic twins, but it makes no serious attempt to determine whether the twins it is studying are either.  It states instead that “Zygosity was assessed through a parent questionnaire of physical similarity, which has been shown to be over 95% accurate when compared to DNA testing.”  So what the study actually finds is that if parents think their children are really like each other, those children get more similar educational results than they do if their parents think they are different.  The authors assume that the explanation for those similarities between twins must be their DNA – and not, for example, whether parents talk to them and treat them in the same way.

Having said that, there is one finding in this paper that brought me up short, and I think it does reflect on policy.  The argument is that as the curriculum has become more standardised, less and less variation between results is attributable to the school, and more and more to ‘heritability’ – which really, in this case, means the home background and early years.  That has deep implications for educational equality.

Whatever happened to the planning process?

I’ve been sitting today in a conference where planners were explaining how they managed applications relating to renewable energy.  One senior planner explained, repeating the point with emphasis, that it wasn’t possible to take account of the interests at stake or the question of ownership.  Another suggested that it was not feasible for the Scottish Government to look for any social advantage from planning decisions, and difficult even to require developers to compensate people who lose out in the process.  The best option communities could hope for was a chance to buy shares in development.

If they’re right, something rather strange has happened to the planning process in the last few years.  There was a time when the texts on planning – such as Eversley, The Planner in Society, or Pahl, Whose city? –  made great play of the role of the planner in allocating scarce resources.  There was little doubt that planning created value – granting permission can have a massive effect on the value of land – and people believed that where value was created by the community, there was a strong case for the community to realise some of that value, rather than gifting it all to developers.  No more, it seems.

11 more genes for Alzheimer's? Hardly

The reports of another supposed breakthrough in genetic research are, like so many before it, rather exaggerated.  Last week,  a New Scientist editorial commented that neuroscience

” is plagued by false positives and other problems. … Scientists are under immense pressure to make discoveries, so negative findings often go unreported, experiments are rarely replicated and data is often “tortured until it confesses”. …  Genetics went through a similar “crisis” about a decade ago and has since matured into one of the most reliable sciences of all. “

Yesterday the newspapers were stuffed with reports from that most reliable and mature of sciences, concerning the discovery of 11 genes newly implicated in the causation of Alzheimers.  This is from the Independent:

The role of the immune system in defending the brain against Alzheimer’s disease has been revealed in a study identifying 11 new genes that could help to trigger the most common form of senile dementia.

There’s more than enough there to be able to tell that the report is confused.  In the first place, Alzheimer’s disease is not a single disease entity; it’s a syndrome.  The term is used as a residual category for any form of dementia where there isn’t as yet a clear understanding of the process.   Over the years, the size of that residuum has gradually been reduced as various specific disease entities have been identified – Pick’s, Huntington’s, Parkinsonian dementia, Lewy body, CJD and so on.  The process of refinement still has a long way to go.  Second, there is no evidence that Alzheimer’s is genetically determined or ‘triggered’ by particular genes.  The study does not actually  claim to show that the immune system defends against Alzheimer’s.  All it does it to identify  a group of SNPs or snips (single nucleotide polymorphisms to their friends) associated with the immune system which show some association with the diagnosis of dementia.  That’s an interesting finding, because it suggests that it may be worthwhile to examine immune systems to see what connections emerge.  It’s not the same thing as showing that genes cause Alzheimer’s.

However, it’s not possible to exonerate the authors of the paper altogether of blame for the misrepresentation.  The title of the article, published in Nature Genetics, is:  “Meta-analysis of 74,046 individuals identifies 11 new susceptibility loci for Alzheimer’s disease”.  This does assume that the associations show ‘susceptibility loci’, and it emphasises that it’s a big study, which implies that it has greater authority as a result.   The conclusion suggests that what needs investigating is the potential association with the risk of Alzheimer’s.

There are three common errors here: the paper commits some of the cardinal sins of statistics.

  • Confusing association with causation.  An association doesn’t in itself tell us what the influence of genes is or what the direction of causation is.  It follows that assocation with certain genes doesn’t reveal susceptibility.
  • Confusing significance with risk factors.  A relationship can be highly statistically significant although its effects are very limited.   (On a graph, it’s the slope of the regression line that really matters rather than the closeness of fit of the observations).   It’s possible that some small part of the response is attributable to the associated factor, and in medical terms that’s potentially important – it could relate to a particular condition – but that’s not equivalent to a risk factor, and in any case the work done doesn’t identify that.
  • Fishing, or data mining.  In any very large body of data, there will be some unusual associations – it’s in the nature of the exercise.  It doesn’t follow that those associations can be invested with meaning.  This study  fishes for the data in a massive pool – over 17,000 people with Alzheimer’s, over 37,000 controls and more than 7 million SNPs.  Then in stage 2 there were 8572 people with dementia, 11,312 controls and 11,632 SNPs.  The significance levels were strict  (p < 5 per 10*-8), but the sheer size of the data sample makes the statistics more problematic, not less so.  The method can’t do more than suggest that some patterns merit further investigation.

More open access material

I’ve added some more links to my open access pageCROP, the Comparative Research Group of Poverty, has posted an online version of Poverty: an international Glossary, which I co-edited with  Sonia Alvarez Leguizamon and David Gordon; so now there are four books available.  OpenAir, the Robert Gordon University’s Open Access Repository, has added several more refereed articles.

The list has more than fifty items now.  It’s only a selection of my work – if you want my best stuff, you’ll still have to buy it – but I can see that the page is starting to get unwieldy.  I may need to reorganise it.

Open Access material

Visitors to the blog will see there is a new page, headed ‘Open Access material‘. I had a section in the previous page on ‘Publications’ which listed some of my work that was still available online. I have just added to that very substantially. First, I have managed to obtain the reversion of rights on two more of my books, and now there are three available freely for download, on a Creative Commons licence:

  • Stigma and social welfare, originally published by Croom Helm, 1984
  • Principles of social welfare, originally published by Routledge, 1988
  • Poverty and social security: concepts and principles, originally published by Routledge, 1993.

The books are available in PDF and ebook formats.

Second, the Robert Gordon University’s Open Access repository, OpenAir has asked me for pre-print versions of some of my published articles (their selection is based on the permissions they have from publishers, rather than the merits of the article!) As of today there are nine papers posted; ten more should be added over the next two weeks. Do, please, tell me if there are problems with any of the links.

I’m a firm believer in open access, and I have hopes of adding to this as time goes on, particularly for items which are out of print. It doesn’t, however, include most of my work or the things I’ve done that I’d personally say were the best. You’d have to pay for them, or get them from a university library.

The Curse of Wikipedia

I’ve not finished reading the Leveson report yet – Lord Justice Leveson is not a man to use one word when fifteen will do, and I have two volumes still to go. I was amused to read that the report has been led astray by Wikipedia, treating it as a reliable source without any attribution. This is the sort of thing I tell my students off about. The names of the founders of the Independent had been tampered with by someone from California, and Leveson used the adulterated list.

I contributed to Wikipedia myself a few years ago, adding to articles on the welfare state, social security, the Poor Law and such like, but I haven’t touched it for some time. The sticking point was the article on “Socialism”, which took it for granted that socialism was equivalent to Marxism. I put in five alternative definitions of socialism, with appropriate academic references; it was all deleted. (There is a short version of this on my website.) So I put it up again, puzzled, and it was deleted again, by people who were not prepared to accept that anything apart from their belief should be included. Then I put up a flag to say “this article is disputed”, and that was taken down too. There was no effective system for moderation, and I gave up. Wikipedia’s article on socialism is still desperately misleading. I have no idea whether this happens very widely, but it says something about ‘the wisdom of crowds’ – and the reliability of Wikipedia as a source.

For what it’s worth, the alternative definitions of the ‘welfare state’ have been taken down, too. US contributors find it difficult to understand that in much of Europe, the “welfare state” is not simply run by government.

How copyright threatens academic communication

Wikipedia has announced that it will shut for a day, in protest about threatened restrictions in the USA which will enable rights-holders to shut down sites that breach copyright. My website, like the rest of my writing, is scrupulously referenced, but I have considerable sympathy for Wikipedia’s position. The laws on copyright present a serious obstacle to learning, communication and intellectual development.

As a writer, my work is often used by other people without any form of recognition. Students, journalists and some academics routinely borrow from, copy or plagiarise what I have written. This may rankle, because it’s rude and incompetent, but for the most part I have to put up with it. To publish a work is to place it in the public domain. I expect – or hope – that my work will be read, discussed, and disseminated. The most disappointing experience is not when my work is cited without payment or acknowledgement, but when it sinks under the waterline. I’d much rather that people read and used my ideas than that they didn’t, and I’ve never encountered an academic who thinks differently.

The laws do not work in the interests of people like me; they work against us. The main effect of current rules about copyright, for any teacher, researcher or writer of non-fiction, is to restrict the ability to cite, illustrate points and argue with positions. I can’t use extended quotes from historic figures like Keynes or Beveridge. I’m barred from duplicating some texts first published in the sixteenth century. I can’t afford to use any photos in my books – the standard fees for two or three photos will consume all the royalties for eighteen months’ work. Many respectable peer-reviewed journals insist on full assignment of copyright, without payment. The primary function of the copyright laws is to defend, not the creators of intellectual property, but the interests of the businesses who have secured the rights.

Laws have to be developed to permit the free flow of information. The main rights that need to be protected are the rights of commerce – that people cannot present themselves as someone else, and people should know what they are buying. The laws go much further than they need to do to make that possible. The current rules on copyright, related to the time of death and some bizarre rules about assertion of rights, make it fiendishly difficult to decipher what is available for duplication, what isn’t, and who owns the rights. There’s only one kind of restriction that stands a chance of being understood and respected – that is, as we have with patents, a right to exclusive production for a limited, fixed period of time following publication. And that’s not what the law says or does.