Plotting proportional data as nested circles in R

4 minute read

For better or worse, you see proportional data represented with nested circles a fair bit in the media.

It’s rare that plotting proportional data is not best done with some kind of bar chart, but occasionally I do come across a case where I think nested circles convey a message well (while being visually attractive).

Here I’ll demonstrate how to make a static nested circle plot using the R package packcircles and ggplot2, although if you want interactivity you can also check out circlepackeR which creates snazzy html widgets.

Packing circles

In the first place, you need a dataframe with the total and subset values you want to plot.

Here is some data I scraped off MycoCosm showing the number of genome assemblies available for different fungal lifestyles, which will be my larger circles. I then want the nested circle area to represent the subset of the total which have already been published - this is actually similar to a figure I created for my PhD thesis introduction!

head(mycocosm.lifestyles.df)
##       lifestyle num num.pub  colour
## 1     endophyte 142      85 #009E73
## 2    lichenised  16       8 #E69F00
## 3  mycoparasite 105      32 #F0E442
## 4   mycorrhizal 199     123 #56B4E9
## 5 phytopathogen 263     155 dimgrey
## 6    saprotroph 258     178 #0072B2

packcircles handles the creation of circles with area proportional to the numbers we give it.

This involves first generating a dataframe with the central point and radius of each circle.

library(packcircles)

#Get radius and x and y coordinates for centre of larger circles
circle.layout <- circleProgressiveLayout(mycocosm.lifestyles.df$num,
                                         sizetype="area")

#Optionally add a small gap between circles so they're not touching
circle.layout$radius <- circle.layout$radius * 0.95

head(circle.layout)
##           x          y   radius
## 1 -6.723095   0.000000 6.386940
## 2  2.256758   0.000000 2.143920
## 3  2.875424  -8.014137 5.492162
## 4  3.958529  10.072890 7.560930
## 5 16.394459  -1.676497 8.692137
## 6 -9.972836 -15.447186 8.609115

We can then generate a dataframe with enough vertices to plot a polygon that looks like a circle.

#Create a dataframe of vertices to draw each 'circle'
circle.vertices <- circleLayoutVertices(circle.layout, npoints=50)

head(circle.vertices)
##            x         y id
## 1 -0.3361547 0.0000000  1
## 2 -0.3865177 0.8004959  1
## 3 -0.5368122 1.5883674  1
## 4 -0.7846681 2.3511895  1
## 5 -1.1261766 3.0769318  1
## 6 -1.5559518 3.7541492  1

And finally plot the larger circles - we’ll use alpha to make them translucent so that they are distinguished from the nested circles we add later.

library(ggplot2)
library(tgutil)

#Plot circles
gg.circles <- ggplot() + 
  geom_polygon(data=circle.vertices,
               aes(x, y, group=id, fill=as.factor(id)),
               colour=NA,
               alpha=0.3) +
  scale_fill_manual(values=mycocosm.lifestyles.df$colour) +
  coord_equal() +
  theme_void() + 
  theme(legend.position="none") +
  ggpreview(width=4, height=3, unit="in")

Add nested circles

Now we want to create the polygons for the nested circles, which essentially means repeating the above steps with the nested data.

#Get radius and x and y coordinates for centre of nested circles
circle.layout.pub <- 
  circleProgressiveLayout(mycocosm.lifestyles.df$num.pub,
                          sizetype="area")

#If you previously added a small gap between circles, make sure to
#do so again
circle.layout.pub$radius <- circle.layout.pub$radius * 0.95

However before creating the polygon vertices for these nested circles, we first need to replace the central points with those of the larger circles so that our nested ones overlay correctly.

#Replace x and y with that of the larger circles, but keep the
#same radius
circle.layout.pub <- data.frame(x=circle.layout$x,
                                y=circle.layout$y,
                                radius=circle.layout.pub$radius)

Now we can generate the vertices and add the nested circles to the plot.

#Create a dataframe of vertices to draw each nested 'circle'
circle.vertices.pub <- circleLayoutVertices(circle.layout.pub,
											npoints=50)

#Add to plot
gg.circles.nested <- gg.circles +
  geom_polygon(data=circle.vertices.pub,
               aes(x, y, group=id, fill=as.factor(id)),
               colour=NA) +
  ggpreview(width=4, height=3, unit="in")

Add labels

Finally we can make another dataframe with information to label the circles.

#Combine original dataframe with the layout dataframe
circle.labels <- cbind(mycocosm.lifestyles.df, circle.layout)

#Add lifestyle labels to centre of circles
gg.circles.nested +
  geom_text(data=circle.labels,
            aes(x, y, size=num, label=lifestyle),
            fontface="bold") +
  scale_size_continuous(range=c(1.5, 3.5))

Alternatively we could label with the original values or percentage published, and add a colour legend for the lifestyles.

#Add new column with percentage of published genomes
circle.labels$percent <- round(
  circle.labels$num.pub/circle.labels$num * 100
  )

#Add percentage labels
gg.circles.nested +
  geom_text(data=circle.labels,
            aes(x, y, size=num, label=paste0(percent, "%")),
            fontface="bold",
            show.legend=FALSE) +
  scale_size_continuous(range=c(2, 6)) +
  scale_fill_manual(values=circle.labels$colour,
                    labels=circle.labels$lifestyle) +
  guides(fill=guide_legend(
    nrow=3,
    direction="horizontal",
    title=NULL,
    label.theme=element_text(size=7, margin=margin(l=-3)),
    keywidth=unit(7, "pt"),
    keyheight=unit(7, "pt"))
  ) +
  theme(legend.position=c(0.7, 0.15))

Pretty simple!

Session details

sessionInfo()
## R version 4.2.2 (2022-10-31 ucrt)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 22621)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_United Kingdom.utf8 
## [2] LC_CTYPE=English_United Kingdom.utf8   
## [3] LC_MONETARY=English_United Kingdom.utf8
## [4] LC_NUMERIC=C                           
## [5] LC_TIME=English_United Kingdom.utf8    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] tgutil_0.1.14     ggplot2_3.4.2     packcircles_0.3.5
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.9        highr_0.10        pillar_1.9.0      compiler_4.2.2   
##  [5] tools_4.2.2       digest_0.6.31     evaluate_0.21     lifecycle_1.0.3  
##  [9] tibble_3.2.1      gtable_0.3.3      png_0.1-8         pkgconfig_2.0.3  
## [13] rlang_1.1.1       cli_3.6.0         rstudioapi_0.14   yaml_2.3.6       
## [17] xfun_0.36         fastmap_1.1.0     withr_2.5.0       dplyr_1.1.2      
## [21] knitr_1.42        generics_0.1.3    vctrs_0.6.2       systemfonts_1.0.4
## [25] grid_4.2.2        tidyselect_1.2.0  glue_1.6.2        R6_2.5.1         
## [29] textshaping_0.3.6 fansi_1.0.3       rmarkdown_2.21    farver_2.1.1     
## [33] magrittr_2.0.3    scales_1.2.1      htmltools_0.5.4   colorspace_2.0-3 
## [37] labeling_0.4.2    ragg_1.2.5        utf8_1.2.2        munsell_0.5.0