Academic freedom: the problems with a contentious report aren’t mainly about statistics

A report published by Policy Exchange seeks to defend right-wing academics against the suppression of their academic freedoms.  Their cause is open to question, and I’m not sure that I should be bothering with a report that has been described as ‘methodologically abysmal‘, but I’m intrigued that there’s so little understanding of basic research methods on both sides of the argument.  On one hand, we have this somewhat inept explanation in the report itself:

The sample consists of 820 respondents (484 currently employed and 336 retired; average age of current academics is 49 and of those retired is 70). Given the approximately 217,000 academic staff working in British universities in 2018-19, our sample is proportionately many times larger than a conventional opinion survey (typically a sample of 1,500 across a national population of 60m). As such our data has a good claim to being representative of the wider academic population even though, as with all opinion surveys, there is a margin of error in the results.

A survey isn’t made more representative simply by being larger.  There are potential biases in the inclusion of a hefty proportion of retired academics and the assumption that non-responses (from page 51, 24% to 39% of the totals) don’t skew the results .

On the other hand, we have the combative response of Jonathan Portes, who comments that this would fail any basic undergraduate course on statistics.  Well, he’s right that their argument is based in bad statistics.  The reporting of the methodology and the questions isn’t systematic or complete. The size of a sample does not make it representative, and making it bigger does not make it more representative, it only magnifies the bias.  But of course this sort of thing  wouldn’t actually fail a project, because undergraduate projects are judged by what they do, not just by how sound they are.   I’m also troubled by Portes’s dismissal of ‘dubious anecdotes‘, the common complaint of those who believe in the inherent superiority of numbers.  What is the difference between ‘anecdotes’ and responses that can be counted?  Why is richer, fuller evidential material less credible than ticked boxes?  Qualitative research studies do the same kind of thing that is done in the courts: they look for evidence, and they look for corroboration of that evidence. The ‘anecdotes’ in most research studies, including this report, are the bits that really matter.  Additional note, 13th August:  Jonathan Portes has written to me to clarify that he was intending to challenge accounts that he thought were ‘fabricated’, rather than the validity of using anecdotes.

In the course of my career, I’ve taught research methods for about twenty years.  I’ve often found that neophyte students come to the subject with preconceptions about what research evidence ought to look like: ideally there should be numbers, and clear categories of response, and statistics, and statements about representativeness. That seems to be the attitude that has prevailed here.  The basic questions we need to ask, however, are not about statistics.  They are, rather, a question of what makes for evidence, and what we should make of the evidence when we have it.  The Policy Exchange report tells us openly that it was looking for corroboration of problems experienced in a small a number of widely reported incidents – that’s the background to their report, in Part 1.  Their sample consisted of academics and retired academics registered as respondents on Yougov.  There may have been some statistical biases in that process, and it’s possible that the retired academics may have answered differently to others; we do not have enough information to tell.

Their respondents pointed to a range of issues.  The questions they ought to have asked about their data, then, was not ‘is the sample big enough?’, or even ‘how representative is this sample?’   but ‘what does the evidence tell us about the issue we are looking at?’  The first thing you can get from a survey like this is a sense of whether there’s an issue at all.   The second is whether there is corroboration – whether different people, in different places, have had related experiences.  There’s some limited evidence to back that up -there are contributions from a handful of right-wing academics,  but the  report also indicates that there is a small but identifiable element of political discrimination across the spectrum.  (I’ve encountered that myself: I have been rejected more than once for jobs because the external assessor at interview objected to something I’d written about poverty.)  Interestingly there is little in the survey relating to more extreme examples, and ‘no platforming’ hardly appears as a problem.  The third is whether we can discern patterns of behaviour.  That’s more difficult to judge, and it’s where information about extents might have been helpful; the main pattern the report claims to identify is a ‘chilling effect’, that people who are fearful of consequences tend to alter their behaviour to avoid the potential harm.  That’s plausible but not conclusive.

The two main weaknesses in this report, in my view, are not about statistics at all.  The first rests in the bias of the design.  The questions asked people tendentiously about right-wing causes such as multiculturalism, diversity and family values.  An illustrative question:

If a staff member in your institution did research showing that
greater ethnic diversity leads to increased societal tension and
poorer social outcomes, would you support or oppose efforts by
students/the administration to let the staff member know that they should find work elsewhere? [Support, oppose, neither support or oppose, don’t know]

I suppose my immediate reaction would be that anyone who claims to ‘show’ a clear causal link between complex and unstable categories of behaviour, rather than ‘argue’ for an interpretation, hasn’t quite grasped the nature of social science.  (The same criticism would apply to someone claiming to prove the opposite.)  But the questions that people ask often reveal something about the position of the team that’s asking, and this is the point at which, if I’d been asked, I’d probably have stopped filling in the questionnaire.  (I wasn’t asked.  I was removed some years ago from the Yougov panel after I objected to the classification of racial groups I was being asked to respond to.  I got a formal letter from Peter Kellner telling me my participation was no longer required.)

The report’s other main weakness lies in its political recommendations, centred on the appointment of a national Director for Academic Freedom.  I couldn’t see any clear relationship between the proposals for reform and the evidence presented.




An old-fashioned approach to evidence? Guilty as charged.

I’m old-fashioned, and I’ve just been upbraided for it.  An article by Brian Monteith in the Scotsman made a number of claims which I thought rather far fetched, so I looked at some other evidence.  Monteith had written, at some length, that “the Euro currency project has been an economic catastrophe”, that since 1994 the growth of the US economy had far outstripped the Eurozone, and that if only the UK had not been within the EU we would all have been much richer. I checked some basic figures with the World Bank’s data and wrote this:

The idea that the Euro has been an ‘economic catastrophe’ is wishful thinking. Mr Monteith chose to start the clock in 1994. On the World Bank’s figures  income per capita in the Eurozone started in 1994 at $19516 and by 2017 had reached $43834, an increase of 125%. Income per capita in the USA started at $27350 in 1994 and finished at $60200 in 2017, an increase of 122%. It’s not a huge difference, but growth over time in the Eurozone more than kept pace with growth in the USA.

Growth in the UK, by contrast, was only 110% over the same period. If only our economic performance had been as good as the Eurozone’s.

This, I now know, was totally misguided, because it attracted this as a response:

You’re living in the past !….”Paul Spicker”
Any fool can quote PAST statistics !
Nothing to do with future prospects !

So there we are.  In the course of the last few years on the blog, I’ve tried to back up everything I say. The mistake I’ve been making all this time is to take statistics and evidence from the past, when they should have come from the future instead.  What I should have used is the crystal ball – I’m working on it.

Perhaps I should add that “Paul Spicker”, given inverted commas in the rebuke, is not an invented personality. I obviously lack the imagination that I need to contribute to social media.

Measuring hunger

In the process of catching up on stuff from the World Bank, I came across another ill-conceived paper, published in January:  The challenge of measuring hunger.  Different methods of measurement, the authors complain, yield radically different results:  “In our survey experiment, we calculate hunger to range between 19 and 68 percent”.  Perhaps they might have worked out from this that there is something fundamentally wrong with the approach they’re taking.  It’s more than fifty years since the indicators movement first argued that we had to stop thinking about social issues in terms of single, accurate, precise measures.  What we need are indicators, pointers or signposts – multiple sources of evidence where we look for direction, reinforcement and corroboration, rather than authoritative answers in tablets of stone.  Anything else is doomed to failure.

A little more on why we can't trust the statistics in published articles

I’ve referred earlier this week to the work of Ioannidis, who argues that most published medical statistics are wrong. The British Medical Journal regularly uses its Xmas issue to publish some disconcerting, off-beat papers.  In a previous issue, they produced the findings of a randomised control trial which showed an apparently impossible result: praying for people whose outcomes were already decided several years ago seemed to work.  The message:  don’t trust randomised control trials, because they’re randomised.  This year, an article, “Like a virgin”, identifies 45 women in a sample of nearly 5,500 who claim to have had a virgin birth. The message: don’t believe everything people tell you in surveys.    If only medical journals applied the same rigour to some of their ‘serious’ results.

11 more genes for Alzheimer's? Hardly

The reports of another supposed breakthrough in genetic research are, like so many before it, rather exaggerated.  Last week,  a New Scientist editorial commented that neuroscience

” is plagued by false positives and other problems. … Scientists are under immense pressure to make discoveries, so negative findings often go unreported, experiments are rarely replicated and data is often “tortured until it confesses”. …  Genetics went through a similar “crisis” about a decade ago and has since matured into one of the most reliable sciences of all. “

Yesterday the newspapers were stuffed with reports from that most reliable and mature of sciences, concerning the discovery of 11 genes newly implicated in the causation of Alzheimers.  This is from the Independent:

The role of the immune system in defending the brain against Alzheimer’s disease has been revealed in a study identifying 11 new genes that could help to trigger the most common form of senile dementia.

There’s more than enough there to be able to tell that the report is confused.  In the first place, Alzheimer’s disease is not a single disease entity; it’s a syndrome.  The term is used as a residual category for any form of dementia where there isn’t as yet a clear understanding of the process.   Over the years, the size of that residuum has gradually been reduced as various specific disease entities have been identified – Pick’s, Huntington’s, Parkinsonian dementia, Lewy body, CJD and so on.  The process of refinement still has a long way to go.  Second, there is no evidence that Alzheimer’s is genetically determined or ‘triggered’ by particular genes.  The study does not actually  claim to show that the immune system defends against Alzheimer’s.  All it does it to identify  a group of SNPs or snips (single nucleotide polymorphisms to their friends) associated with the immune system which show some association with the diagnosis of dementia.  That’s an interesting finding, because it suggests that it may be worthwhile to examine immune systems to see what connections emerge.  It’s not the same thing as showing that genes cause Alzheimer’s.

However, it’s not possible to exonerate the authors of the paper altogether of blame for the misrepresentation.  The title of the article, published in Nature Genetics, is:  “Meta-analysis of 74,046 individuals identifies 11 new susceptibility loci for Alzheimer’s disease”.  This does assume that the associations show ‘susceptibility loci’, and it emphasises that it’s a big study, which implies that it has greater authority as a result.   The conclusion suggests that what needs investigating is the potential association with the risk of Alzheimer’s.

There are three common errors here: the paper commits some of the cardinal sins of statistics.

  • Confusing association with causation.  An association doesn’t in itself tell us what the influence of genes is or what the direction of causation is.  It follows that assocation with certain genes doesn’t reveal susceptibility.
  • Confusing significance with risk factors.  A relationship can be highly statistically significant although its effects are very limited.   (On a graph, it’s the slope of the regression line that really matters rather than the closeness of fit of the observations).   It’s possible that some small part of the response is attributable to the associated factor, and in medical terms that’s potentially important – it could relate to a particular condition – but that’s not equivalent to a risk factor, and in any case the work done doesn’t identify that.
  • Fishing, or data mining.  In any very large body of data, there will be some unusual associations – it’s in the nature of the exercise.  It doesn’t follow that those associations can be invested with meaning.  This study  fishes for the data in a massive pool – over 17,000 people with Alzheimer’s, over 37,000 controls and more than 7 million SNPs.  Then in stage 2 there were 8572 people with dementia, 11,312 controls and 11,632 SNPs.  The significance levels were strict  (p < 5 per 10*-8), but the sheer size of the data sample makes the statistics more problematic, not less so.  The method can’t do more than suggest that some patterns merit further investigation.

Confusion about poverty

An editorial in Friday’s Scotsman complains:

“People are classified as being poor if their income is less than 60 per cent of the UK median. Given this is a relative, as opposed to absolute, measure, then we can say with mathematical certainty that the poor will always be with us.”

I gave some examples of similar muddles in a paper I wrote last year (Why refer to poverty as a proportion of median income?, Journal of Poverty and Social Justice, Volume 20, Number 2, June 2012 , pp. 163-175.) The researchers who introduced the measure explained that the test “does not mean that there will always be poverty when there is inequality: only if the inequality implies an economic distance beyond the critical level.” However, people don’t understand averages or distributions – and journalists usually get where they are by studying words rather than numbers.

There are problems with the use of 60% of the median, but the supposition that it invents poverty isn’t one of them. The main problems are that it compares poor people with incomes that are not much better, that it assumes it’s always impossible for more than half the population to be poor, and that it’s not well understood. The main defence is that it works, more or less, for Europe and for the OECD countries. 60% of the median is primarily a test of very low income, and in countries where income distributions are more equal, poverty is much lower.

The Benefit Expenditure Tables

A new version of the Benefit Expenditure Tables has been released, including  information about Child Benefit and Tax Credits, which for the last few years have been treated as a matter for HMRC.    The nominal cost of all benefit expenditure for 2012/13 was £201.9 billion, of which £107.7 billion (53%) goes on pensioners.

There has been a couple of weeks delay while the presentation of figures were being rejigged after the Budget, but it doesn’t look as if they’ve been done too carefully.  For example, the series for Child Benefit in table 1 stops in 2002/03 and hasn’t been resumed despite the inclusion of all the data needed for it in a different table.  And there is no obvious reason why the nominal outturns on costs for older people in the “Summary Table: GB Benefits and Tax Credits” should be lower than the cost for DWP benefits alone in the “Benefit Summary Table”.

Leaving the Work Programme

It has been announced that one in ten people referred to the Work Programme, 73,260 up to last April, have been subject to sanctions for failing to avail themselves of the opportunities. Or it has not been announced, depending on your point of view, despite the very specific figures and the ministerial comment: the Telegraph explains that this is what “the Department for Work and Pensions (DWP) is expected to confirm next week when it publishes the first official statistics on the overall success of the programme.” If this is an official announcement, it would be another clear breach of the UK Statistics Authority’s Code of Practice.

We know what the Minister Mark Hoban thinks of the figures; he thinks it shows that people are scrounging. “Sadly some people are clearly very determined to avoid having to get job at all.” There are other possibile explanations. It might be, for example, that people think they are better able to find work if they’re not on the programme. It might be that the tens of thousands of people who have been forced to claim JSA instead of incapacity benefits are too sick to work, and now they are being cut off benefits altogether. It might be that people are being sanctioned for not replying to letters. It might be that some have found work – because, despite the propaganda, that’s what most unemployed people do. It might be that people who are being cut off from benefit are being forced into crime or prostitution instead – it’s happened before. We just don’t know, which is why we need the detailed evidence and statistics.

Official statistics and the 'neighbours from hell'

I have written today to the UK Statistics Authority to raise some questions about the government’s figures on “troubled families”. In December the Prime Minister explained:

Today, I want to talk about troubled families. Let me be clear what I mean by this phrase. Officialdom might call them ‘families with multiple disadvantages’. Some in the press might call them ‘neighbours from hell’. … We’ve always known that these families cost an extraordinary amount of money, but now we’ve come up the actual figures. Last year the state spent an estimated £9 billion on just 120,000 families – that is around £75,000 per family.

The same figures have been repeated in a series of government statements, including material from the Department of Communities and Local Government, the Home Office and the DWP.

The UK Statistics Authority exists to guarantee the integrity of official statistics in the UK. They have established a range of criteria for integrity, transparency and quality, but among other requirements they state that departments should

  • “Ensure that official statistics are produced according to scientific principles”
  • “Publish details of the methods adopted, including explanations of why particular choices were made.”
  • “Issue statistical reports separately from any other statement or comment about the figures and ensure that no statement or comment – based on prior knowledge – is issued to the press or published ahead of the publication of the statistics.”

That is not what’s happened here. “We’ve come up with the actual figures”, the PM’s statement says, and policy has been rolled out from that starting point. Some explanation of where the figure of 120,000 families come from appeared in a note from the Department of Education, though it was not publicized; there have been trenchant criticisms from Jonathan Portes and Ruth Levitas, on the basis that there is no connection between the indicators used to identify troubled families and the problems of crime and anti-social behaviour. The basis of the costings is still not publicly available. I’ve asked the Statistics Authority to consider whether there has been a breach of their Code of Practice.

The impact of Work Experience

In February, I wrote to the UK Statistics Authority to express concern about some uncheckable claims being made about the benefits of work experience. The Minister for Employment, Chris Grayling MP, had published an open letter to Polly Toynbee on Politics Home, claiming that “a significant number of placements turn into jobs, with the employer getting to like the young person and keeping them on. … so far around half those doing placements have come off benefits very quickly afterwards.” In the Times on 24th February, he also claimed that “half those young people stop claiming benefits after taking part.” (p.32) This was referred to in BBC’s Question Time on 23rd February as evidence that the scheme was working well. The only evidence, however, was based on a first cohort of 1300 people on placement from January 2011 to March 2011, when by the time of the statement the scheme had been extended to more than 34,000 people.

The DWP has now published more data, this time covering 3490 people in the scheme from January to May 2011. It shows an increase in employment, by comparison with a group of non-participants, from 27% to 35%. There are two main reservations to make about the figure: that it still relates only to an early cohort, who may (or may not) have been easier to place than later cohorts, and that there is no explanation of what being “in employment” might mean in terms of hours or duration (the only test seems to be that the employer has sent a return to HMRC). It is also a lot less than the 50% originally claimed.