This brief tutorial walks beginners through the typical steps needed to create a data visualization. Don't worry, it is not hours you need to do that. Based on one example you’ll see what needs to be done, from finding data to cleaning data to creating a chart and publish it.
Good preparation is important. Creating the visualization with Datawrapper is essentially the last step.
1. Step: Find data
Look for data providing context related to an event or a question. Whether you are working with a large or small dataset is not the decisive factor. It’s much more important to ask the right questions.
Check the quality of the data. Perform simple math, by checking the sums. Aim to look at the development over a longer time period. Look for inconsistencies, strange outliers or surprising connections and dependancies.
To find relevant and interesting data-driven information, use one of the perspectives below - all of them can lead to effective charts:
Compare: Unemployment figures, education offerings in your region or costs of city services. Are they higher or lower compared to other cities or countries?
Long time frames: Official sources tend to focus on figures from the last quarter. Often the context and perspective changes when you have data showing development for the last five, ten or even twenty years.
Per capita: An official statement might promise millions of investment for schools. Sounds great. But have you done the math? How many students will benefit, over what period of time? How much support does each recipient actually get? Recalculating larger programs on a per capita base often clears up the picture and provides a better understanding.
Check budgets or models: Perform simple checks of sums and totals. Ask questions how the data was gathered and what assumptions or models guided that collection. One example: How is the number of unemployment calculated in your country? Check it. Try to come up with alternative models and compare them.
2. Step: How to clean your data
Before starting to visualize you need to clean and prepare the data, whether working with Datawrapper or other tools. Often you simply need to perform a simple clean up to find the relevant aspects. The clean-up not done in Datawrapper, but in either Excel or Google Sheets. Actually any spreadsheet program works here.
To illustrate how this is done, here is just one example: On "World Salaries" you can loop up an compare data on how much certain professions earn, there are huge differences from country to country. The salaries have been adjusted based for “purchasing power parity“ - this is important for us to actually compare the salaries meaningfully.
This is how the data looks on the web:
Source: World Salaries
In here there are many aspects that are interesting and should lead to a story. In this particular case simply copy this data using your mouse: Mark the whole table on the page and then copy this.
Copy and paste the data into Excel, Google Sheets (or a similar software)
Now open a spreadsheet and paste the data in here. It should look like this.
A screenshot of how your data should look like in the spreadsheet is below.
Tipp: Should you get funny or weird looking results from the copy & paste like all the data in one field, then try copying a bit more than just the table from the website. In our example we copied down to the references. The trick here is that we made sure that we got all of the HTML-table, which then avoids that the spreadsheet can not read it.
Cleaning up in the spreadsheet
To let everyone see how this works we did the clean-up in a public Google Sheet. If you feel more comfortable using Excel, go ahead. The steps described here actually work in any spreadsheet software.
If you never did this before you might wonder what "clean up" actually means. What we do in several steps is trying to get rid of all formatting, extra lines, currency symbols and so on. This is a step you need to do in other visualizations software as well - even in Illustrator.
Back to the data. In the data we copied there are a number interesting aspects. But always try to focus on one aspect per visualization, at least at the start. In order to create a telling visualization, we need to focus on one aspect. As a result we delete everything, just keeping “country” and “net monthly income”.
Question: What are typical steps to clean up data for visualizations?B
Below are a few, very typical steps. To clarify: The goal is that you have the raw numbers, without additional formatting in the spreadsheet before you import into the data into Datawrapper.
- Get rid of empty lines or rows
- Use "search & replace" (a lot) to get rid fo redundant, repeating information
- Use "search & replace" to get rid of currency symbols ($ or €)
- Use "search & replace" ensure that names in the legend are consistent (U.S., USA, United States)
So, step by step we are working through this process in Google Sheets.
Tip: If you have never worked with a spreadsheet remember that if the data is still formatted as text all text and numbers will be on the right side of the cells. If the numbers are actually formatted as numbers (as it should be) they will be on the right.
Tip: Sometimes there are leading empty fields in the data. This can be hard to track down. To get rid of those use TRIM in Google Sheets. There is a similar feature in Excel.
When is the data "clean"?
Obvious question: When is the data "clean" enough to be imported into Datawrapper?
Below you’ll see the cleaned up data. You should use the “search & replace” command in either Google Sheets or Excel to get to this point, do not do it manually.
- We cleaned Column A, got rid of the the long texts, etc.
- We cleaned column B, the salaries (in US Dollars) are just numbers (indicated by the numbers all on the right side). No empty spaces, no currency symbols. Just the numbers. So this is "clean" now.
The numbers are already sorted, all extra elements and mark-ups (bold, dots, currency symbols) are gone.
Step 3: Upload your data into Datawrapper
There are two options to upload data into Datawrapper: First by simply copying it in the spreadsheet and then dropping it into the field below. An alternative is to upload a .csv file (an abbreviation for “comma separated values”, think of it as lingua franca for data). If you try to upload a .csv formatting is important, otherwise it won’t work.
Step 2: Check and describe your data
This is the second step in Datawrapper. It lets you check whether the data was imported correctly. There are a number of additional features here, but that is all covered in another tutorial specifically covering step 2 of Datawrapper. For now, we just move on.
Datawrapper can display data in a number of basic, but versatile variations:
Lines: Best used to show changes over time and trends. To reduce the number of labels on the axis there is a trick: In Excel or another spreadsheet software simply shorten the years from something like “2001” to “o1”. Another option, especially when you work with many data points is to display only the most important years and leave the others blank. (“1900”, “1910”, “1920”,…).
Columns: When the values of each year or period have no direct relation to each other or when it is important to display the growth of the values use a column chart instead of a line chart. Datawrapper can display several columns with different values side by side.
Bars: This variation is useful if you have a lot of labels you need to display. Here this information is on the left side of your visualizations, leaving much more room and making it easier to read.
Pie: Use this if you want to display parts of a total – like all the smartphones sold in a country. But try to limit the number of pies to a maximum of five, if you have more use a bar or column chart instead.
Donut: A variation of the pie, with a hole in the middle. You can display the total in the middle by selecting this option in Datawrapper.
Tipp: A handy and often used feature of Datawrapper is “transpose”. With one click you are changing the order of the data, the x- and y-values are switched around. Just experiment with this and use it when your chart does not display as intended. Again, you can jump back and forth between the steps, all your changes will be kept.
Step 6: Publish and embed
Done. In the last step you see your chart and get an embed code. Copy that and drop it into any web editor for publishing.
Hint: For a proper display the width of your visualization is important. If your webpage has a very narrow layout try to reduce the number of values on your x-axis or choose a bar chart.
In some newsrooms editors don’t have permission to use HTML editors. In such cases there should be one user with usage rights allowing this for embedding and publishing.
That’s it. Hope you have fun working with Datawrapper.