Introducing trelliscopejs

I’m really excited to announce the beta release of a visualization project I’ve put a lot of work into for the past several months, trelliscopejs.

trelliscopejs is an R package that brings faceted visualizations to life while plugging in to common analytical workflows like ggplot2 or the “tidyverse”. To quickly get a feel for it, take a look at this screen capture:

I’d highly recommend that you read the full documentation for the package to get a full picture of what the package is about, but to keep this post concise and not bore you with the details, here are a couple of examples.

First let’s install some packages that we’ll use in the examples:

devtools::install_github("hafen/trelliscopejs")
install.packages(c("gapminder", "housingData", "rbokeh"))

Gapminder with ggplot2

The first example is for ggplot2 users. You can swap facet_wrap() for facet_trelliscope() and write code like this:

library(trelliscopejs)
library(gapminder)

qplot(year, lifeExp, data = gapminder) +
  xlim(1948, 2011) + ylim(10, 95) + theme_bw() +
  facet_trelliscope(~ country + continent, nrow = 2, ncol = 7, width = 300)

To create a visualization like this:

gapminder_lifeexp
life expectancy vs. year by country using Gapminder data
1 - 12 of 142
Prev
Next
First
Last
panel
country
Afghanistan
continent
Asia
panel
country
Albania
continent
Europe
panel
country
Algeria
continent
Africa
panel
country
Angola
continent
Africa
panel
country
Argentina
continent
Americas
panel
country
Australia
continent
Oceania
panel
country
Austria
continent
Europe
panel
country
Bahrain
continent
Asia
panel
country
Bangladesh
continent
Asia
panel
country
Belgium
continent
Europe
panel
country
Benin
continent
Africa
panel
country
Bolivia
continent
Americas
Sorting on:
country
continent

If this display doesn’t appear correctly for you (because of blog aggregators, etc.), you can follow this link to the display in a dedicated window.

This is a display of life expectancy across time for countries in the gapminder dataset. Instead of a static faceted plot, we get an interactive display. We can interact with the panels by sorting and filtering on metrics that were computed about each subset of our data being plotted, and we can paginate through panels when they don’t all fit on one page.

Go ahead and experiment with the interactive controls in the plot above. You can click the fullscreen button in the bottom right if you want more space. The “question mark” icon in the upper right corner will give you more information about how to use the viewer.

While interacting with the display, do you see anything interesting? The data being plotted is fairly simple – we see a usually steady increase in life expectancy over time, with varying mean by country. However, the eye is able to quickly catch deviations from the normal pattern such as the dips in Rwanda and Cambodia or the peaks in life expectancy in several African countries in the late 80s / early 90s with life expectancy decreasing after. The plots of the raw data have a story to tell that you might miss if you just computed summaries.

Housing data with dplyr and rbokeh

The housingData package has a dataset, “housing” that gives the monthly median list and sold price for residential homes by US county, provided by Zillow.

Let’s take a look at the median list price over time by county, but this time illustrating the use of the trelliscope() function with dplyr. We will create the plot using rbokeh although you can use any plotting library you’d like to.

The trelliscope() function is meant to be used in “tidyverse” pipelines, with the idea that a faceted display can be described by a data frame of summaries computed on groups, with one of the summaries being a plot object.

In the example below, we group the housing data by county and state and then compute some summaries (the slope of the list price vs. time, the mean list price, the mean sold price, and the number of non-NA observations). We also compute a “summary” plot of the median list price vs. time.

library(rbokeh)
library(dplyr)
library(housingData)

lm_coefs <- function(x, y)
  coef(lm(y ~ x))

d <- housing %>%
  group_by(county, state) %>%
  summarise(
    slope = lm_coefs(time, medListPriceSqft)[2],
    mean_list = mean(medListPriceSqft, na.rm = TRUE),
    mean_sold = mean(medSoldPriceSqft, na.rm = TRUE),
    n_obs = length(which(!is.na(medListPriceSqft))),
    zillow_link = cog_href(
      sprintf("http://www.zillow.com/homes/%s_rb/",
        gsub(" ", "-", paste(county, state)))[1]),
    panel = panel(
      figure(xlab = "time", ylab = "median list / sq ft", toolbar = NULL) %>%
        ly_points(time, medListPriceSqft,
          hover = data_frame(time = time, mean_list = medListPriceSqft)))
  ) %>%
  filter(n_obs > 1)

To emphasize that the result of this is simply a data frame:

Source: local data frame [2,975 x 8]
Groups: county [1,772]

             county  state mean_list mean_sold n_obs        panel
             <fctr> <fctr>     <dbl>     <dbl> <int>       <list>
1  Abbeville County     SC  72.76035  61.69598    77 <S3: rbokeh>
2     Acadia Parish     LA  67.18250  73.64299    77 <S3: rbokeh>
3   Accomack County     VA 123.22507  54.03628    81 <S3: rbokeh>
4        Ada County     ID 104.24764       NaN    81 <S3: rbokeh>
5      Adair County     IA  64.20355       NaN    76 <S3: rbokeh>
6      Adair County     KY  70.14871  50.51296    77 <S3: rbokeh>
7      Adair County     MO  68.75936 281.20160    81 <S3: rbokeh>
8      Adair County     OK  67.41909       NaN    81 <S3: rbokeh>
9      Adams County     CO 121.83557 130.31120    81 <S3: rbokeh>
10     Adams County     IA 124.56285       NaN    76 <S3: rbokeh>
# ... with 2,965 more rows, and 2 more variables: slope <dbl>,
#   zillow_link <chr>

Note that one of our columns is an rbokeh object. We can now pipe this data frame into trelliscope(), which will create a display for us. The summaries we computed will be made available as metrics we can use to interact with the panels in the display, and the panels will be created based on our panel column.

d %>%
  trelliscope(name = "list_vs_time")

Here is the resulting display:

list_vs_time
monthly median list price vs. time for 2984 US counties from 2008–2016
1 of 2975
Prev
Next
First
Last
Sorting on:
county
state

If this display doesn’t appear correctly, please visit this link.

There are a lot of fun things you can explore with this display. Which counties are the cheapest to live in? Which were not effected by the housing crisis? What is happening in your county or state?

Easy to embed and share

I haven’t mentioned yet that trelliscopejs is an htmlwidget, producing pure HTML / JavaScript applications, meaning you can easily embed your displays in RMarkdown Notebooks or documents, and can share the generated HTML file with others or post on the web through a simple web server or Github pages. For example, the displays you saw in this post are hosted on Github pages.

Why is this useful?

Trelliscope is based on the idea of “small multiples”, a simple but serious visualization technique. For more on the virtues of “small multiples”, please read here.

What’s next

This post provides two examples of interfaces for creating Trelliscope displays, but it would not be difficult to support other workflows as well. We are working on making sure we provide the most convenient interfaces to the most common workflows, so there will be some iteration on getting that right.

If you want to see some other things we plan on making happen for trelliscopejs, see here.

Give it a try!

I hope you will find some interesting use cases and give trelliscopejs a try on your data. Again, please read the full documentation for more on getting going with the package. Although I stated that it is a beta release, I’ve waited to announce it until things have become quite stable. Don’t be surprised if there are minor tweaks in the future, but also don’t be afraid to give it a try!

Avatar
Ryan Hafen
Data Scientist, Statistical Consultant

Related