2016/17 Year in Review

It’s been over a year since I have written a blog post. At the end of 2016 I meant to write a post looking back on the year, but then an insanely busy 2017 full of consulting work kicked in. This post is a quick catch-up of the past couple of years and a brief outline of some of the things I’d like to accomplish in 2018. Some of these things may be of interest as I haven’t discussed them before, but I do plan to go into more detail on many of them in posts in the future this year.

Open source development

Due to some applied consulting work kicking into high gear in 2017, I was a bit less active in the open source world this last year. But I am still very much involved. I have been working on quite a few things, but the hardest part for me is writing about it and spreading the word, and with limited time, the communication is the part that has suffered most (not to mention providing support for issues on some of my more fringe projects - sorry!). I hope to rectify this as much as possible this year.

The most significant open source developments in the past year have been with TrelliscopeJS and rbokeh.

TrelliscopeJS

In 2016 I completely rewrote Trelliscope (from our DeltaRho project) and migrated it from a very complex Shiny application to a pure JavaScript application, renaming the result TrelliscopeJS. It consists of a javascript library trelliscopejs-lib written using React, and an R package htmlwidget interface, trelliscopejs. The interface for creating Trelliscope displays was migrated away from being based on [datadr]() and instead to play nicely with common workflows such as plugging directly into ggplot2 with facet_trelliscope() and integrating nicely with the Tidyverse. A good overview of TrelliscopeJS can be found in these slides.

In 2017, most of the improvements to TrelliscopeJS came from Barret Schloerke as part of his Ph.D. thesis work. The most exciting of these improvements is the addition of “automatic cognostics”, where ggplot2 objects are analyzed and context-specific metrics are automatically computed to provide additional means of interaction in the resulting display. He also added some other features, and I’ve been working on a few as well and I’ll be posting more about these in the future.

rbokeh

I completely rewrote the rbokeh package this year. In a nutshell, the rewrite mimics all of the Bokeh classes in R, providing the building blocks to achieve near or full feature parity with Bokeh much more easily. The internals are much cleaner as well and the package should be much easier to maintain and contribute to. The outward interface stays largely the same with some additional features. I hope to be posting in more detail on this and provide some updated documentation.

geofacet

A quick side project that got a lot of unexpected attention this year is the geofacet package, which provides a way to flexibly visualize data for different geographical regions by providing a ggplot2 faceting function facet_geo() which works just like ggplot2’s built-in faceting, except that the resulting arrangement of panels follows a grid that mimics the original geographic topology as closely as possible. I never got around to posting or sharing this package, but others did it for me (thanks!). I have a few blog posts on this topic in the pipeline.

DeltraRho

For those who are familiar with my past work, I was heavily involved in the development of R-based tools for big data (namely datadr and trelliscope powered by RHIPE) as part of the DARPA XDATA program. This project started out being named “Tessera” and in 2016, it was renamed to “DeltaRho”, and I created a new website for the project at deltarho.org.

The XDATA program that funded our work on DeltaRho ended in 2016 and I have not had time or funding to maintain this codebase, although it’s still in use, and further development of methods and interesting applications are heavily underway at Purdue. I haven’t blogged much on these topics, but I still have very passionate viewpoints about tools for analyzing big data (I expressed some of these here) and plan to write more about this in the future as well as contribute to software.

The major component born out of XDATA that is still very much active in my development cycle is TrelliscopeJS. Expect to continue to see developments on this package and hopefully integration of the package into other popular big data tools being developed for R.

Other R packages

I’ve written, updated, or been involved with several other packages over the past 2 years. A few of them are: rmote, osfr, packagedocs (and related lazyrmd), seriesclust, stlplus.

I’ve also continued to maintain the htmlwidgets gallery.

Applied / Consulting

I have been very fortunate to be working on some very meaningful problems with interesting data science challenges for the Bill & Melinda Gates Foundation Healthy Birth Growth & Development knowledge integration (HBGDki) program. I’m excited that some of what this program is doing has now been made public: hbgdki.org.

I’ve been involved in many things in this program including applied data science, analysis software development, visualization tool building, communication, technology design and development, and training. I hope to be able to share more of the work from this program in the future, as there are some novel team data science and data integration efforts going on.

Travel

Travel for both consulting and presentation at conferences has been a pretty big time sink these past few years. According to my handy travel app, since I started consulting in 2015, I’ve spent 473 hours in the air, 204 of those in 2017 alone. Preparation for, traveling to, attending, and then decompressing from these excursions takes a huge amount of time and I’m hoping to cut down on this. Not all travel has been for work, thankfully. In 2017, as a whole family we completed visiting all 50 US States this year (including my 4-year-old)!

Plans for 2018

My main commitment for 2018 continues to be consulting. But I am committing myself to spend more time writing this year. And I hope to find plenty of time to work on open source projects.

  • For TrelliscopeJS, I’m excited to report that a DataCamp course and a new website are underway, as well as plans to release the package on CRAN very soon.
  • For rbokeh, I’m planning on putting the recent rewrite through the paces and writing new documentation, followed by a release.
  • I expect to see an open release of several of the global health projects I’ve been working on as part of my consulting work.
  • Writing: I have a long backlog of blog post topics. Hopefully I can catch up on those.

Finding a Balance

I really enjoy building tools but I also really enjoy real data science work, and I think the latter helps inform the former, so it’s good to do both. I’ve been very fortunate to have funding sources that have allowed me to do both. However, it’s difficult to do both well at the same time, especially if you are pro work-life balance (which I am). Finding a good balance is something I’m still trying to figure out. A lot of this is out of my control and depends on where my current funding is coming from, and this is why some of my open source work took a backseat in 2017. We will see how 2018 turns out.

Thanks

While looking back over the past few years, I’ve been overwhelmed with feelings of gratitude. I would like to express gratitude to all of the wonderful people I’ve been able to work with and everyone who has made it possible for me to be able to work on what I love. I have been very fortunate to work with many talented, passionate, friendly people. Thanks to everyone who has helped support or promote my work. Thanks to everyone who uses my code and reports bugs and waits patiently. Thanks to everyone who has invited me to speak. And of course thanks to everyone in the R community for making it an exciting place to be.

Avatar
Ryan Hafen
Data Scientist, Statistical Consultant