ISSS608 Visual Analytics and Applications - DataViz Makeover 1
The visualisation has a chart title and sub-title to inform the audience that the graph shows data on resident labour force by age and the units are in per cent. However, the graph itself does not have the y-axis or its title or label, which lets the audience wonder what is the y-axis measure if one misses the small print of the chart sub-title.
The x-axis should be have a title to indicate that the data is grouped into age bands. The positioning of the x-axis just above the table values also created some confusion pertaining to the unit measure of the x-axis and y-axis. Therefore, in this visualisation, it is important to have proper axis titles and unit measures.
Looking at the visual alone (both graph and table), the message focus seems to be on the spread across the different age bands for the years 2009 and 2019, with two reference lines indicating the median age for each aforementioned year. The lead-in statements provide more information and numbers, and possible explanations for the shift in share of the resident labour force, which resulted in the increase in median age between 2009 and 2019. The lead-in statements further grouped the age bands (“aged 55 & over”, “aged 25 to 54”) in its description, which is inconsistent with the graph display. There is also mention of the Labour Force Participation Rate (LFPR) and its possible relationship with the labour force age, which is not reflected in the visualisation.
With reference to the original report on page 22, the entire visualisation and lead-in statements served to describe the trend that there are “more older residents in the labour force”. This message is not clearly conveyed by the visualisation and is only explicitly mentioned at the end of the lead-in statements.
When put together, the audience is overwhelmed by the information provided in the lead-in statements and underwhelmed by the graph with no annotations to highlight the trends. Effective visualisations should work the other way round, where the lead-in statements should set the context and background succinctly, and the visualisation should convey the data and the intended message in a concise manner.
Line charts are frequently used to display time-series data, with the time factor plotted on the x-axis. In this visualisation, data is plotted for two time periods (2009 and 2019) but are represented as two series on the chart area instead, with the age band categories indicated on the x-axis. The use of wrong chart type led to some initial confusion on what the graph is trying to convey.
The data for the years between 2009 and 2019 were not plotted on the graph, which diminishes the clarity of the visualisation. Questions on the integrity of the visualisation may also arise, such as whether the visualisation is trying to show what it wants the audience to see, or to hide certain trends that happened during those missing years.
Indicating the data source gives credibility to the visualisation and allows the audience to look into the data source in more detail. Adding a note to explain the rounding implication on summation of percentages is helpful to avoid questions from the more meticulous and detailed audience. The note also hinted that the data values shown are in percentages. The aesthetics could be further improved by aligning the two lines of text.
The audience is able to quickly differentiate the data from the reference lines, as the data is made more obvious by using thicker lines and the reference lines are thinner and dotted.
The visualisation has well-placed labels to differentiate between 2009 and 2019 for each data visualisation, namely the line graphs, reference lines and table values. This eliminated the need for a legend as the audience can immediately know the year represented by which line. In addition, the label for the median reference line included the median value, giving the audience the information upfront without having the need to trace the reference line to the x-axis to look for the median age.
The labels can be further improved by removing the word “June”, as the intended message is concerned with the age over a longer period of time (in years), and does not require the exact month that the data is based on.
The visualisation defined a unique mapping between the data for each year and colour (grey for June 2009; blue for June 2019). This colour scheme is used consistently across the line graphs, reference lines and table, which makes it easier for the audience to locate and make comparisons in the visualisation. The colour palette used is kept simple, which makes the design of the visualisation clean and pleasant to look at.
The change in background colour allows the audience to identify different parts of the visualisation. The audience is able to quickly see that the line graph elements are on the light grey background, whereas the table elements are on a dark-coloured background.
There is good contrast between the line graph and table. However, the attention of the audience is drawn to the table due to its darker colours. The background colour of the table can be improved by using lighter shades.
The proposed design attempts to convey the message that there are more older residents in the labour force more visually, while upholding the visualisation principles and best practices. The advantages of the proposed design are as follows:
The proposed visualisation is designed using Tableau and uploaded on Tableau Public. The link can be found here.
Raw data for number of residents by age bands and median age from 2009 to 2019 were found in two separate Excel files, renamed to T7 and T2 respectively. It was noted that there are multiple worksheets in T7, showing the overall number of residents by age bands and by sex.
The two Excel files were imported as separate data sources, shown in the figure below.
The table containing the overall number of residents is found in T7_T worksheet. Both tables T7_T and T2 were connected as table datas in Tableau.
Many null values were detected by Tableau after import, as the data tables in the Excel files were formatted for easy reading and reporting. The figure below shows T7_T after importing in Tableau.
The Tableau function Data Interpreter was used for the initial data cleaning. The figures below show the resulting tables after using Data Interpreter.
The table below shows the changes made to the respective tables:
| For T7_T | For T2 |
|---|---|
| Hid column F1 | Hid all other columns except for Labour Force Participation Rate (%) and Median Age of Labour Force (Years) |
| Renamed Age (Years) 2 to Age_band | Renamed June to Year and changed the data type to “Date” |
| Pivoted columns containing years | Renamed Labour Force Participation Rate (%) to LFPR (%) |
| Renamed Pivot Field Names to Year and changed the data type to “Date” | Renamed Median Age of Labour Force (Years) to Median Age (Years) |
| Renamed Pivot Field Values to No_residents | Added filter to remove empty rows |
| Added filter to remove the Total across age bands (null values under Age_band) |
Labour Force Participation Rate was retained in the data table as it was mentioned in the original visualisation. The use of the data would be explored in later steps.
The figures below show the cleaned versions of the two data tables.
The following steps were taken to create the stacked bar graph:
The following steps were taken to add the median age line to the stacked bar graph:
With the base graph, the next step would be to add meaning to the graph through aesthetics. The following changes are made in the Sheet view:
As the Sheet view of Tableau only catered for chart title, the title of the visualisation and lead-in statements were added to the visualisation using the Tableau Dashboard feature. The steps taken are as follows:
A screen capture of the final product in Tableau Dashboard is shown below.
The message of the visualisation is clearly shown from the title and a brief lead-in statement shares on the possible factors that caused the increase in older residents in the labour force. The annotations highlight key points revealed by the graph, which supports the message.
Besides conveying the original intended message, other major observations can also be made from the final proposed visualisation:
While the increase in median age seems gradual (median age stayed at 43 from 2014 to 2018), the increase in the percentage of older residents in the labour force seems to be at a faster pace over the same period of time. The graph also seems to hint that the median age could increase and fall within the 55 & above age group in the next 10 to 20 years. This has implications on government policy planning and also for corporate organisations in planning for an older employee population.
From the final product, it is visually more prominent that the percentage of labour force in the 15 to 24 age group is rather constant and hovers around 10%, and that there is an obvious shift of the proportion from the 25 to 54 age group to the 55 & above age group. This causes concern and may trigger authorities to think of ways to increase the labour force aged 24 and below.
Based on the above two observations, more questions on the linkage between labour force, working population, unemployment rates and birth rates surfaced. Some of these are identified in the lead-in statement as factors that may have an impact on the resident labour force and the median age. The audience may be curious as to what exactly is the relationship and pattern of these other factors. As this dataviz is focused on re-visualising the current message, the additional data is not included in the proposed visualisation. It would be more informative if the data of the aforementioned factors can be placed in the same view (as a dashboard) to give a more accurate and complete picture of the trends in the Singapore labour force.