Inside Story

Lies, damned lies, and data

Wrong, misleading or beside the point: bad data is bad for policymaking — and examples abound

Danielle Wood Books 30 January 2023 1095 words

Great slogan, pity about the facts: the Vote Leave bus in Portsmouth in May 2016. Matt Cardy/Getty Images

There are many unsung heroes in the public service: people with deep expertise beavering away quietly in the public interest. Australia’s Parliamentary Library researchers are one such group. They conduct research for politicians, write useful summaries of bills, and publish guides to policies and evaluations of their likely impact. But although they make our policy debates more fact-based than they would otherwise be, their talents are not widely recognised.

That’s why it is especially noteworthy when an unsung backroom researcher steps into the limelight — not from the Parliamentary Library in this case, but from Britain’s equivalent. The person in question is Georgina Sturge, the House of Commons Library’s senior statistician, and she has just released a book about data — or, more specifically, about bad data and how it misleads.

Sturge’s book, Bad Data, has particular resonance for anyone (like me) who has spent many years peering into the sausage-meat vats of public data collection and use. Almost every example of “bad data” advanced by Sturge has an Australian parallel.

I certainly let out a knowing chuckle or two reading Sturge’s discussion of “zombie statistics” — those dodgy numbers that haunt public debates. Sturge highlights how the bogus figure for Britain’s weekly contribution to the European Union, £350 million, continued to be referenced by Brexit campaigners even after it was comprehensively debunked by the UK Statistics Authority.

In Australia, similarly disingenuous numbers haunt a host of debates. Some of the more egregious come from anonymously commissioned modelling in 2015 that suggested Labor’s $1.5 billion policy to wind back negative gearing would wipe $20 billion off GDP (!) and increase rents by 10 per cent (!!).

Those numbers continued to emerge from beyond the grave even several years after they were shown to be garbage, and even after they had inspired a Media Watch episode exposing the willingness of some media outlets to publish almost any number without a sense check.

Ditto Sturge’s discussion of dodgy policy costings. Despite government forecasts that outsourcing probation services could save British taxpayers £10.4 billion over seven years, the policy was considered a failure and the government paid an additional half a billion to end the private contracts early. Similar examples of cost blowouts abound in Australia — from disability services to major infrastructure and defence projects. Optimism bias and the rubbery forecasts that result are a global phenomenon.

Then there is the “algorithm unleashed” approach to policy implementation. Anyone who has been following the fallout from Australia’s scandalous robodebt scheme will shake their heads when Sturge describes similar crackdowns on tax and benefit fraud in Britain and the Netherlands.

As well as blind faith in badly designed algorithms, both schemes generated huge waves of stress among recipients of incorrect debt notices and, in the case of the Dutch government, more than €1 billion in compensation payments.

Bad Data lays bare the good (data is very helpful for informing policy decisions) and the bad (for many policy decisions the data is non-existent or poor) in the easy-to-understand style you would expect of a data expert who spends all day communicating with the less numerate.

Sturge describes eye-openingly common problems with data ­— inconsistent definitions, sample-size problems, lack of useful time series — as well as issues with modelling. She takes a deep dive into several key areas of public life — crime, poverty, migration — and points out the inherent difficulty in delivering high-quality and time-consistent data on these crucial topics.

One surprising gap in Bad Data is its failure to highlight the exciting developments in government data collections — “good data” — that are starting to overcome at least some of the problems Sturge highlights. She mentions administrative datasets, but her readers don’t get a sense of just how revolutionary it is for policymakers to be able to link datasets that cover the whole of a relevant population.

For example, linking tax data showing someone’s income with location data and health data allows us to understand how disease prevalence, access to healthcare and health outcomes vary across locations and socioeconomic and cultural groups.

Linking data can also help understand people’s pathways through government services, creating a powerful tool for identifying gaps. How many of those turning up to emergency departments, for example, have made visits to a GP that might have kept them out of hospital?

In Australia, many key public service organisations have been slow to understand the potential of these linked whole-of-population datasets and invest in the capability needed to work with them. The light coverage in this book suggests the same may be true in Britain.

The other key omission is more understandable, given Sturge is a serving civil servant. Her book contains no strong critique of the British government’s commitment — or lack of commitment — to investing in better data.

In her opening chapter, Sturge makes the powerful observation that while we can easily find how many times Harry Kane made an on-target shot at goal with his left foot in the last season of the English Premier League, Britain doesn’t have accurate data on how many people are eligible to vote, how many died from Covid-19, and whether crime is going up or down.

The difference, of course, is investment: the football analytics industry invests in paying people to catalogue, in meticulous detail, every pass, tackle and touch.

What is apparent is Sturge’s frustration that the UK census is conducted only every ten years. But she stops short of more obvious questions about funding of statistical agencies, and how much and what data should be collected to enable government to make better decisions. In an environment where the Office for National Statistics, like the Australian Bureau of Statistics, has sometimes been starved of funds while demands on its services kept growing, this is an important corollary to the story of bad data.

But we should celebrate the fact that one of Britain’s “anonymous” civil servants has been able to share her knowledge more widely. I seriously doubt that the risk-averse Australian public service would support an employee publishing such a book.

Sturge has produced a useful and engaging guide to understanding the common pitfalls of data and modelling in public life. But perhaps, for those wanting more, the next item on her to-do list should be a follow-up book about how and when governments should invest in better data, and the opportunities they have to get the most out of enhanced analytical and computing capability. •

Bad Data: How Governments, Politicians and the Rest of Us Get Misled by Numbers
By Georgina Sturge | Bridge Street Press | $32.99 | 299 pages