Data democratisation

I’ve had an interest in systems thinking for a few years now, so I was keen to learn from a Local Authority that had been through a “lean” initiative earlier this year. One of the major outcomes was that they were clearer about their purpose. In fact, they’d described a culture change in their thinking.

It was interesting that the numbers they were keen to share to demonstrate their improvement were not the obvious NI157 (nee BVPI109) statistics, but the refusal rate. By reducing the number of applications that had to be determined twice they were removing waste.

Refusal rates are part of the PS2 collection and published in the DC statistics journal, so a quick scattergraph later could show me how the distribution worked out.
scattergraph of appeals against determinations

It seems from the data that Westminster, Sheffield and Wandsworth are doing something differently to Bradford, Barnet and Croydon. I shared this plot with a few colleagues, as a way of trying to understand whether it actually “meant” anything.

As often happens, this view prompted misunderstandings and suggestions of better ways of aligning data to demonstrate trends. This often marks the end of this type of thought experiment, as these ideas provoke rabbit holes of “what if” questions that lead away from thinking about our work programme.

Data democratisation

By coincidence, at about this time I was investigating some ways of capturing and streaming some of our workshop content. During one of my lunchhour e-wanderings I watched an excellent video (“Many Eyes: Democratizing Visualization” by Fernanda Viégas & Martin Wattenberg) – originally just to think about how difficult it might be to video our own speakers.

This video is about ‘many eyes’ – a website that promotes sharing and exploring data. Using a site like many-eyes allows everyone to use the simple interface to have a fiddle with someone’s data. The people I’d previously shown my scattergraph to could all have their own play – during which time they would probably find a much better way of putting it together. I decided to give it a try.

To make the example more interesting I took the 2006/07 DC statistics as published by communities and local government – table 1.5 is the one for me. I mashed it into a sensible format and combined it with the planning inspectorate statistics for the same year. It takes a little time (one persons “York” is anothers “City of York”) but a half-hour later I was ready to upload a combined dataset of decisions and the nearest thing to a quality feedback loop that planning has.

You can find the initial data set together with a set of derivations that I made without thinking about them too much here. If you have Java installed you should go and see the visualisations live – the following screen grabs are more difficult to read and don’t tell you who each blob actually is.

first visualisation
It seemed to me that the number of appeals should vary with the number of applications. Or rather, the reasons that it might not vary would be interesting ones:
– differences in refusal rates
– social differences in the applicants readiness to appeal
– differences in the mix of type of applications across urban, rural and park authorities

Of course, the first person I showed it to pointed out that lots of variables reduces clarity. Why not have a graph of overturned decisions as set out against refusals ? These numbers were not in my original upload so I submitted a second data set.

appeals and refusals

I’m still wondering what the statistics mean. It is true that Bromley, Croydon and Barnet have a more ‘wasteful’ process ? Or does their position and applicant profile mean that this order of difference is inevitable ?

So what?

I have put a small amount of work bringing together the datasets from PINS and CLG. It is available in a format that positively promotes interaction – until you try the many-eyes interface you won’t understand how useful it is for those “I wonder if” questions. Join in.

Moreover, if I (or someone else) adds more meta data to the authorities – for example CIPFA family data – it is useful for everyone else.

Lastly, anyone can take this national picture and remove anything from it to help understand the real patterns. Is it helpful to have national parks and urban unitaries on the same graph ? Probably not.

However, it is not perfect. Many-eyes ease of use means you can start to twist and prod without understanding whether the number you are plotting is a percentage, a percentage of compliance against a percentage or a real number. It makes sense to spend 10 minutes ensuring you understand what the base data really is.

To return to the system thinking opening, it is also not clear to me that this aggregation of performance targets is useful. NI157 seems a completely unhelpful way of managing a planning service from both the perspective of planners and applicants. However, until someone gives me access to their back office and I have a rainy weekend to learn ggplot this is as good as I can expect.

[scatterplot produced using OpenOffice 2.4 Calc; screenshots prepared using Irfanview 3.98]


2 thoughts on “Data democratisation

  1. Pingback: The Killian & Pretty review : it’s the system isn’t it ? « Planning Advisory Service

  2. Pingback: Planning Performance « Planning Advisory Service

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s