Introducing rminiconda

Easily Install and Manage Python in R

Lately I’ve been fascinated with the reticulate R package, which provides pretty much seamless access to anything implemented in Python without needing to leave R. R provides many great interfaces for technologies implemented in other languages like C/C++, SQL, Fortran, etc. but for some reason I had never thought of it as a first-class interface to technology implemented in Python. This is probably due to all the R vs. Python talk that makes you think it has to be one or the other, and also possibly due to prior R/Python solutions that didn’t have as good of a user experience as reticulate does. When you think of R as an interface to Python, the universe of things you can do in R gets quite a lot bigger.

One major difference, however, with R as an interface to Python vs. code written, for example, in C/C++, is that R packages that depend on C/C++ are much easier to use out of the box because of either being pre-built on CRAN for major operating systems, or being easy to build due to the necessary libraries already existing on the user’s machine. With Python, you don’t have a guarantee that users will have the right Python version or package management system installed, and you can’t pre-build Python into an R package. Ultimately, it is inevitable that you will have users who will have an issue, and it is a lot to ask a user to make sure they have their Python environment configured correctly, to the point that they may choose not to use your package because it’s not clean and easy to install.

For these reasons, I built the rminiconda package. At its core, rminiconda provides a simple R function that installs miniconda in an isolated, “namespaced” location that you can fully customize for your particular use case. It also provides utilities for making this installation and configuration an automated part of an R package setup. The miniconda Python installations provided by rminiconda do not interfere with any other Python installation on your system. It works on Linux, MacOS, and Windows.

Install

You can install rminiconda from github with:

# install.packages("remotes") # if not installed
remotes::install_github("hafen/rminiconda")

Standalone Usage

If you want to install an isolated miniconda for your own uses, you can simply call install_miniconda().

rminiconda::install_miniconda(name = "my_python")

This will place an isolated miniconda installation in a directory called "my_python in a base directory that houses all miniconda installations installed through rminiconda. The base directory is determined based on the following rules:

  • If a system environment variable, R_MINICONDA_PATH exists, this will be used as the base installation directory.
  • Otherwise, if the rminiconda package directory is user-writable, this will be used as the base installation directory.
  • Otherwise, the directory ~/rminiconda will be used as the base installation directory.

You can specify for this installation to be used with reticulate with the following:

py <- rminiconda::find_miniconda_python("my_python")
reticulate::use_python(py, required = TRUE)

You can install either Python version 2 or 3 with the version argument. Also, you can maintain as many miniconda installations as you would like by using different names for each one.

rminiconda::install_miniconda(version = 2, name = "my_python2")

Note that currently rminiconda only installs the latest miniconda for Python 2 and Python 3. Installing specific Python versions may be supported in the future.

Usage in an R Package

If you are writing an R package that depends on a Python library but you don’t want your users to worry about any aspect of Python installation and configuration, you can use rminiconda to configure your users’s environment for them.

Suppose, for example, that you want to wrap functionality in the Python shap package in your own R package (Note that this has already been done with shapper - this is just an example). Suppose you have named this package “shapr”. A recipe for using rminiconda as part of your package might look something like this:

#' @import rminiconda
.onLoad <- function(libname, pkgname) {
  # Undesirable side-effects but if not unset, can lead to config issues
  Sys.setenv(PYTHONHOME = "")
  Sys.setenv(PYTHONPATH = "")
  is_configured()
}

# Check to see if the shapr Python environment has been configured
is_configured <- function() {
  # Should also check that the required packages are installed
  if (!rminiconda::is_miniconda_installed("shapr")) {
    message("It appears that shapr has not been configured...")
    message("Run 'shapr_configure()' for a one-time setup.")
    return (FALSE)
  } else {
    py <- rminiconda::find_miniconda_python("shapr")
    reticulate::use_python(py, required = TRUE)
    return (TRUE)
  }
}

#' One-time configuration of environment for shapr
#'
#' @details This installs an isolated Python distribution along with required dependencies so that the shapr R package can seamlessly wrap the shap Python package.
#' @export
shapr_configure <- function() {
  # Install isolated miniconda
  if (!rminiconda::is_miniconda_installed("shapr"))
    rminiconda::install_miniconda(version = 3, name = "shapr")
  # Install python packages
  py <- rminiconda::find_miniconda_python("shapr")
  rminiconda::rminiconda_pip_install("shap", "shapr")
  reticulate::use_python(py, required = TRUE)
}

Note that in the configure function you can add as much code as necessary to install dependencies and further configure the Python environment as your application needs.

You might have a collection of different R packages that wrap Python libraries but are meant to work together. For example, maybe you have different packages relating to different parts of an ML pipeline, and “shapr” is just one of them. In that case you could use a common Python installation namespace across all packages, such as “ml-pipeline”, and use that across all of your package configurations.

Development Status

I’m interested to see the general level of interest in the existence of a package such as this and welcome feedback and discussion with those who know more than I do in this area to help it get a “production/CRAN-ready” stamp of approval. Please use Github issues to engage with me on ideas or issues that you have.

Avatar
Ryan Hafen
Data Scientist, Statistical Consultant