Statistics

Published

April 9, 2022

Overview

This is one of my favourite visualizations. It looks like very simple, and straight forward with the use of the ggdist::stat_dots function to make dotted ditributions of the wages by highest educational status reached.

The Tidyverse libraries needed for the data manipulation:

The data set is the Wage dataset from the {ISLR2} package. This package contains a variety of datasets used for statistical analysis in An Introduction to Statistical Learning book.

library(ISLR2)
data(Wage)
wage_h <- Wage%>%group_by(education)%>%summarize(avg_wage=mean(wage))
kableExtra::kable(wage_h,row.names = F)

Data Wrangling

A bit of data wrangling to group by education and calculate the mean value and the standard deviation of the wage.

Wage1 <- Wage %>%
  mutate(education=gsub("\\d. ","",education)) %>% #count(year)
  group_by(education)%>%
  mutate(mean=mean(wage),
         sd=sd(wage)) %>%
  ungroup() %>% # pull(mean)%>%summary
  select(education,mean,sd) %>%
  distinct()

Set some extrafonts:

library(extrafont)
# loadfonts()

For this visualization I used: family = “Chelsea Market”

And finally, to make the plot, use:

  • ggdist::stat_dots to make the dots ditribution
  • distributional::dist_normal to normalize the data
Wage1 %>%
ggplot(aes(y=fct_reorder(education,mean),
             xdist = dist_normal(mean, sd),
             layout = "weave",
             fill = stat(x < 111.70))) + 
  stat_dots(position = "dodge", color = "grey70")+
  geom_vline(xintercept = 111.70, alpha = 0.25) +
  scale_x_continuous(breaks = c(20,60,90,112,140,180,220)) +
  tvthemes::scale_fill_hilda()+
  # add a title / subtitle and a caption ------
  labs(x="Wage values from 2003 to 2009",
       y="",color="Race",fill="wage < avg",
       title="Wage distribution vs education 2003-2009",
       subtitle="Normalized values",
       caption="#30DayChartChallenge 2022 #day9 - Distribution/Statistics - v2\nDataSource: {ISLR2} Wage dataset | DataViz: Federica Gazzelloni") +
  # set a customized theme -------
  tvthemes::theme_avatar() +
  theme(text = element_text(family="Chelsea Market"),
        legend.background = element_blank(),
        legend.box.background = element_blank(),
        legend.key = element_blank(),
        legend.key.width = unit(0.5,units="cm"),
        legend.direction = "horizontal",
        legend.position = c(0.8,0.1))

If you’d like to save it as .png you can do it with ggsave()

ggsave("day9_statistics_v2.png",
       dpi=320,
       width = 9,
       height = 6)

Resources:

Back to top