How to choose the best interpolation for your colors
As soon as we imported the data for our choropleth map or symbol map, or chose to create a heatmap in the Datawrapper table tool in step 3: "Visualize", Datawrapper lets us choose the colors with these settings:
This is a powerful tool. The colors we choose have a huge impact on our map: How it is perceived, how well our statement is communicated and how honest we present the data.
The following article explains how to use the interpolation menu and what the different options mean for our map design.
- Which interpolation?
- The summary: so, which interpolation should you use?
This article won't explain the different color palettes and the "Colors" menu. If you want to understand this part of the color palette, and how to use the number and color markers, please visit the article "How to use the color palette tool"
When creating a color palette, deciding on how many stops our color palette should have is the most confusing part. So what does Datawrapper mean with "Interpolation" and how does it work?
The kind of interpolation decides on the number of (equally big) parts on our color palette which cover the same amount of our values. Let's untangle that.
Maybe you're saying "What's the problem? I have a high value and a low value. Just give the high value a dark color and the low value a bright color, and fill all counties in between in a linear way." That's what we get when we click on Interpolation and then on linear.
It's a good option when the distribution of data between our high and low value is very even. Often, however, we have a distribution like the following. Here we plot the number of counties in the US with a certain unemployment rate. We see that most of the counties have a pretty low unemployment rate – but there are also some outliers with a very high unemployment rate of 15% – 26%.
The linear interpolation takes every value between the minimum and the maximum value and assigns it a color between the brightest and the darkest color in a linear way. Because of our uneven distribution and the outliers, our map looks like this then.
It's a great map when we want to draw attention to the outlier counties in the US. They stand in high contrast to the rest of pretty-same-looking yellow-greenish counties. But besides that, we can't really see the geographical patterns here.
Quartiles, Quantiles, Deciles
Our map would be better if more counties would be filled with the turquoise-medium-blue colors that are almost not used yet. We can achieve that changing the interpolation. When we choose the quartiles option, we can imagine our color gradient being split up into four parts.
Our map looks darker now. By splitting the data and color gradient further, we "diversified" the colors that are used for filling the counties:
The idea is the same when we change the interpolation further: Quintiles divides our color gradient and data into five parts, and Deciles into ten parts. Here's how the Deciles map looks like:
Natural breaks or the Jenks optimization method is a way to classify data by grouping similar values together within the same range. Jenks says that this method "seeks to reduce the variance within classes and maximize the variance between classes". Here's the same map in natural breaks:
So, which interpolation should you use?
It's a good idea to find a compromise between honesty and usefulness. The linear map is honest because it shows the values on a linear scale and draws immediate attention to the outliers. But maybe that's not what our article is about. Maybe we actually want to talk about the geographical pattern: The low unemployment rate in states like Texas, Kansas, and Nebraska; the Black Belt in the south of the US. To show these pattern, we'll need the Quartiles, Quintiles or Deciles map.
The more stops we add, the more our map will use very bright colors and very dark colors; increasing the contrast of the overall map. That makes it appealing to always use the maps with the most stops: It just looks more dramatic.
But it also makes our reader think that the differences are stark in areas where they're actually not stark at all and less stark in areas where they actually are very stark. To illustrate that, let's zoom into the Decile map. Nye County and Yuma County have both a similar dark blue color, but their Unemployment rates are vastly different. They are 13.8 percentage points apart. La Plata County, on the other hand, is filled with light green, suggesting an unemployment rate that's on a completely different level than Nye County's. But Nye County and La Plata County separate just 5.3 percentage points.
So if we have the goal to create a map on which we can point out the geographical pattern, AND we try to not imply too stark differences that are not there, we would probably go with a Quartiles map as a compromise.
In this tutorial, we looked at when to use how many stops in your color palette for choropleth maps. And we've learned that it's important to find a good compromise between drawing attention to the facts that you want to draw attention to and using the data in an honest way.