How to deal with missing data in line charts
Sometimes, missing values in a line chart will create a gap in the line. In this tutorial, you’ll learn how to draw an interpolated line between the gap so you’ll have a continuous line chart instead:
When do I get patchy lines?
When creating a line chart with multiple lines in Datawrapper, Datawrapper expects that each line should have a value for all of the x values in the dataset. If there are lines that are missing data for a particular x value, then this is considered as ‘missing' data - no line is drawn between these points.
The uploaded dataset looks like this:
You'll encounter patchy data often when the data samples aren't collected consistently, collected in different intervals, or have periods with missing data. E.g. the example chart above shows daily average temperatures from 4 cities (these are dummy data).
- City A: data is collected every day
- City B: data exists for most days but there are a few days where data is missing
- City C: data is collected consistently but only every 2 days
- City D: data is collected consistently but only every 3 days
When there is missing data (shown as – in the above image), Datawrapper will not draw a line in between these points. This becomes a patchy line (City B) or simply data points without any lines drawn in between (City C & D).
Why do I get patchy lines?
The reason behind this is simple: Drawing a line between points with missing data could be misleading, as doing so would suggest that there are specific data available where there is no data. Sometimes, it's better to leave it as is - to show that these data points are missing to indicate incomplete data.
However, that's not always the case.
If your dataset has multiple lines distributed unevenly across the x-axis and your goal is to show a trend across a period of time with a continuous line, it's good to be able to draw a continuous line.
How to draw continuous lines
1. Remove rows without data
The simplest way to remove the gaps in the lines would be to r emove rows that do not have data for all columns (lines) in the dataset.
E.g. 1 – If you want to remove the missing patches for City B, you can remove data points on days where City B is missing data (day 7, 16, 17, 25, 26) which can be done easily in a spreadsheet by filtering out empty rows using the filter function.
E.g. 2 – If you want to show both City C and City D as lines, you can find the common denominator of 2 and 3 which would be 6. Every 6 days, both City C and City D would have data. Remove data points from all other days then City C and City D will appear as lines instead of just dots.
2. Calculate monthly/weekly averages
If you want to have data points across a time period with equal intervals and therefore cannot simply remove rows without data, one solution would be to calculate weekly or monthly averages. Sometimes, it may even make more sense to plot the rolling average to smooth out your data to show a trend across a longer period of time.
Read this blog post to see an example of a chart with rolling (moving) average and loess prediction.
3. Calculate missing values and fill in the gaps
If you want to keep all the data as it is AND show continuous lines, you can calculate missing values and fill them in with interpolated data. We wrote a script that will allow you to do this easily inside Google sheets.
1. Go to this Google sheet we prepared for you.
2. In Google Sheets, click on File → Make a copy.
This will also copy a script called Interpolate.gs that you can find in Tools → Script Editor. The spreadsheet should become available in your Google Drive.
If you're already working on a Google sheet, you can also copy the Interpolate.gs script into your own Google Apps Script editor:
Now, once this is set up, you can use the INTERPOLATEALL() function to interpolate your data.
How to show your interpolated lines
We can now plot the 4 lines as continuous lines. But how can we communicate the difference between the data that were collected and those that were calculated (interpolated)?
Use multiple line symbols
The best way to do this would be to use line symbols to show the data points that were actually collected, like so:
To create a chart like above, upload both your original data + interpolated data in Datawrapper. To see what the uploaded dataset looks like in the chart above, click on Get the data link at the bottom of the chart or create a copy by clicking on Edit this chart.
Then, go to Step 3: Visualize > Refine tab > Customize lines and assign a specific color to each city:
Then go to Customize symbols and click on the checkbox Line symbols. This will display more options. Select multiple for Shape.
Here, you can click on individual lines to display line symbols. Click multiple times to change the symbol shape (circle, square, triangle etc.) Select only the lines that represent the collected data (and not the interpolated ones).
Make sure that on ... every point is selected.
Use dashed lines
Another way to differentiate parts of the data that were collected and those that were calculated is to use a combination of solid lines and dashed lines to show actual data and calculated data:
The style of the lines can be customized from Customize lines:
And it's as simple as that!
If you have any questions or better suggestions, reach out to us at firstname.lastname@example.org.