How to create a scatter plot
The scatter plot is perfect when you want to show the relationship between two quantitative measures. Each dot of the chart is placed on a coordinate grid according to its values of two categories. The scatter plot is a mathematical diagram that is also very prominent in social sciences. At first sight, it might appear confusing and somewhat complicated. But once you learn how to read it, it's easy to spot the correlation of two variables without having to do the math.
The example scatterplot below shows you how the GDP per capita relates to the life expectancy in selected countries:
Notice the diagonal black line. This line helps you to identify the correlation of the two categories (GDP per capita and life expectancy). It is called the trend line. The line is determined by the arrangement of the dots. This line has two properties: direction and strength.
- A line starting in the lower left and rising to the upper right (/) indicates a positive correlation, meaning: when
A, B increases. increses
- A line starting in the upper left and dropping to the lower right (\) indicates a negative correlation, meaning: when
A increases, B decreases andvice versa.
- The strength of a correlation is indicated by the angle of the line.
- If the dots are scattered all over the place, not showing a scheme, and if most parts of the coordinate grid are vacant, don't use a trend line. There simply might be no correlation!
Don't forget to tell your readers how to read a scatter plot! And remember not to confuse correlation with causation, no matter how strong your trend line appears to be. There are numerous examples where wrong conclusions have been drawn.
Preparing and importing the data
You can copy & paste data from Excel or the web, or upload your own CSV files. For example, here is the dataset that powers the chart above. Your dataset should be formatted as follows. You'll need:
- One header row containing labels
- The first column defining labels of the dots
- The following two columns containing numeric values that will be used as x- and y-coordinates. These values will define the positions of the dots in the chart. You can have more (numeric) columns, but you'll need at least two.
The values do not have to be of the same measure (in our examples, the GDP is in US-Dollars and the life expectancy is in years).
That's how the first five rows of our data look like for the chart you can see above:
Country GDP per capita Life expectancy Somalia 624 54.2 Kenya 2898 65.1 China 13334 76.2 United Kingdom 38225 81 Japan 36162 83.2
data setlooks like the table above, you can copy it into Datawrapper and click "Upload and continue".
Check & Describe
This is what the dataset looks like once it is uploaded into Datawrapper. (As you can see, we uploaded two more columns, to assign colors to each dot and to define their size.) Make sure that the box "First row as
label" is ticked so that Datawrapper correctly assigns the values to the labels.
Click on "Proceed" and Datawrapper will take you to the next step.
Once you're in the "Visualize" tab, choose "Scatter Plot" from the grid of available chart types and Datawrapper will create the first iteration of your data. Continue with the steps Refine, Annotate, and Design to finish customizing your chart. We'll cover this in a separate short tutorial which you can read here.