Running a population model in the browser

Over a year ago, I started work on a project looking at our ageing population and specifically one measure called the Old Age Dependency Ratio (OADR). This measure compares the number of people who are above the retirement age to people of working age (16 to retirement age).

Although this measure has its limitations because people are delaying joining the workforce because of education and working past their retirement age, it is still useful when comparing internationally and still gives an indication of the financial implications of our population when considering pensions and health care for example.

We wanted to explain a bit more about what factors were involved in how this OADR ratio changed in the future and some variables have more of an effect than others.

We wanted an easier way for people to understand demography and modelling population in the future without having to understand all the technical details and calculations.

We also want to dispel the misconception people may have had about migration and the magnitude of the effect it would have on the population.

Finally, we wanted to use an interface that brought elements of gamification to engage people.

At the end of June, this project was finally published - How would you support our ageing population.

The Excel beginning

My starting point was an excel model that colleagues in the office had built that use the National Population Projection variants. This excel sheet calculated the population given some numbers about fertility, migration and mortality.

By using different variations you could see what your choices did to the population. This spreadsheet did give you a lot of combinations to play about with and there was a custom option where you could enter your own numbers but you had to enter something for every age and every year. And the results it gave was a table of the population at each age over each year. This isn’t really something that’s intuitive, easy to interpret or could run in straight in the browser.

Building it for the browser

Once I got my head round the calculations in the excel sheet (using a lot of Trace dependents). I had to then think about what people would change, and trying to get this down to one input for each variable. I decided to fix a target amount 25 years down the line for the three inputs (mortality, migration and fertility), with some linear interpolation along the way. I scaled the single year of age numbers by an amount to match the interpolated target for each year. I did this in Excel and showed this to the business area for them to approve my thinking.

Mortality was the difficult one. There are cohort effects which travelling through age groups. Also using something like life expectancy at birth, while familiar to people, is not sensitive to changes at the higher ages. And it’s these higher ages where we are interested in. In the end we settled on displaying the life expectancies associated with 5 mortality variants and the slider snaps to these values.

Once they were happy with the process, I recreated the calculations in the browser. In Excel, it’s hard to see the order of things as it just appears to run all at once, but in JavaScript it’s much more linear. Turns out demography is actually quite simple, just doing a lot of adding and subtracting for people migrating in or being born and dying. And looping over the years into the future.

The interface

We wanted a simple way to interact with the controls and I found d3-simple-slider, a library for making sliders. Initially we had 4 sliders going horizontally but this took up quite a lot of space so we went for 4 vertical ones which looked like a mixing desk.

Sketch of design

The sliders took a bit of extra styling and messing around with the library so it’s ended up quite customised but I’m pleased with the result. For mobile, we show one horizontal slider at a time and store the results in hidden inputs.

We used tooltips to add in extra information where something might have been misinterpreted.

Displaying the results

The population models reruns every time you change any of the sliders or inputs and updates the results you see on the page. Keeping the results in view was important so you could see your changes happening in front of your eyes.

Other examples we’d seen include an FT example where you had to decide how you’d spend the BBC licensing fees between the stations and channels.

We also wanted to give feedback about what people’s choices were in respect to today’s numbers and what it would mean for them. This is done in a couple of ways, there’s the text box that spells out what the numbers are but also lines on the sliders to indicate 2017 levels.

Technical achievements

It’s the first d3.v5 project we’ve done in the team. I decided to do this because v5 uses promises so you have more control when code block executes. This avoided doing calculations without the numbers from the previous step.

I’ve learnt a lot more about how to use bootstrap-grid properly. Bluebird.js to compatibility promises for IE. Tippy.js for tooltips.

What people did

We used Google Tag Manager to find out how people were using the tool and what they were inputting.

We looked at the OADR in 2042 people were getting from their chosen inputs. The biggest peak is from loading the interactive and people making small changes. The periodic jumps are from people changing the pension year. What’s interesting to note is the secondary peak around 300. This is because people are trying to match the 2016 level of OADR. We put feedback into the results box and it’s clear this element of gamification is helping.

We can also see that people like round numbers as well as the ends of the scales. Here’s what users selected for migration with peaks on the hundred thousands.

And here’s fertility. The peak around 1.75 is when the model loads. There’s big peak around the round numbers (1,2,3) and the ends of the scales.

Feedback

We are still collecting feedback about the tool but here two choice picks that show that we are meeting the aims of the project.

During the project, we found out there was another project looking at an alternative measure to the OADR. We decided to align the projects and this took a bit of extra time and effort and changed the way the article turned out as we referenced each other. We also felt it diminished the impact we could have had with the media.

Overall, I’m happy with how the project turned out. People are using the tool and taking the messages away that we wanted them to leave with. Now with a working population model that projects into the future we can use it for more projects. If you’ve got any good ideas, let me know.

What we learnt from making a StatsBot

At the end of March, the ONS published an article looking at the types of jobs at risk from automation. We thought it would be ironic to be able to query the results through a chatbot interface.

Other journalism organisations are using chatbots to explain complex topics. They’ve said that although not every one engages with the chat bots, when people do they have a deep engagement.

I have to say that Jure almost all the work on this project. Our first challenge was finding a chatbot library that didn’t involve any backend. In the end, we found chat-bubble that runs completely off javascript.

Design was also a challenge. With the BBC chat bots, it’s basically multiple choice buttons and it gives you different answers depending on which button you choose. The data we had was about the risk of automation to occupation and also who was doing that job for example gender, region, age.

We designed a few questions to try to figure out all these characteristics through elimination, e.g. what decade were you born in? with the answers being 60s, 70s, 80s, 90s, 00s.

What was more difficult was trying to work out occupation. You could ask people to try to navigate the Standard Occupation Codes (SOC) down the tree to find the job they wanted to see info about but we knew this was clumsy.

In the end, we decided to let users type in an occupation and let the ONS occupation coding tool match what the input to the SOC. This introduced a text box for people to use in the chat interface. This wasn’t ideal as on mobile the keyboard takes up a lot of space. But we felt this was better than navigating the SOC hierarchy.

Mobile screenshot

The article got widely picked up but the BBC pushed a lot of traffic to our site.

Our standard timeframe for considering metrics is a week. After a week we had almost 14 thousand unique visits, 11 thousand of which happened during the first day.

A third of people who visited the site wanting to know what the risk of automation was for an occupation. 91% of those entered an occupation.

The top 5 occupations entered were

  1. Accountant
  2. Teacher
  3. Software Engineer/Developer
  4. Solicitor
  5. Doctor

11% of people who visited choose to read about general facts. Information about location, age and gender were less popular still, with 4%, 3% and 2% respectively.

One thing we debated a lot was the speed of the chat bubbles. Should we make it quicker like the BBC’s or slow so that people can actually read it and not skim?

Processing shapefiles from the ONS geoportal

Our map templates can work with different types of geography (local authority, regions, CCG, LSOA, MSOA etc). The problem is that these geographies are often updated. Luckily we have a canonical source of geographies in the ONS Geoportal.

To get the various geographies to work with our templates, we have to rename the columns and get rid of any extra information that we don’t need. We could do this in QGIS and it would take about 5 minutes.

But why wait 5 minutes when it could be done in 5 seconds! Hello bash script.

You’ll need download the script, make it executable with chmod u+x process_shapefile.sh and copy it to /usr/local/bin.

Next point the script at a zip file of a shapefile download from the geoportal.

It will unzip and inside with be a .topojson file called geog.json which you can use in the templates.

Looking for a data visualisation job?

We have a vacancy in our team.

Frequently asked questions

Q) Can I work remotely / from London?

A) There is flexibility around working arrangements and location. We want you to get the most out of the job and we believe there’s a lot of advantages to being in the office. This could be learning a new shortcut, discussing a problem and also getting to know the different parts of the organisation. I wouldn’t rule out applying because of the location as I’m sure we’re open to discussion about it.

Q) What is your tech setup?

A) We use a whole range of tools. This ranges from just digging around the data in excel to running docker images. We have off-network MacBook for our development work as well as secure windows laptops for pre-released data.

Q) What skills are you looking for?

A) We don’t need you to be great at everything. We work in a collaborative way with other members in our team to bring out the best in each other and for each project. We are looking for people who are enthusiastic and have any combination of the following skills

  • understands what makes a good graph, chart, map or interactive
  • can explore stories in data or generate idea for stories with data
  • familiar with some web technologies and how to make interactives
  • can think about design considerations such as accessibility or working on a mobile
  • can organise themselves to work on projects
  • understands statistical concepts like means, medians, percentiles
  • can write about stories in data

Got more questions? Ping me an email or find me on social media @henryjameslau.

d3-annotations tutorial

I wrote a tutorial for d3-annotations to help people get to grips with how it works. Check it out if you are thinking about using annotations in your visualisations.