Random generalized linear model: a highly accurate and interpretable ensemble predictor
Lin Song, Peter Langfelder, Steve Horvath
Human Genetics and Biostatistics, University of California, Los Angeles
SHorvath (at) mednet (dot) ucla (dot) edu
Peter (dot) Langfelder (at) gmail (dot) com
- Article abstract
- Talk, ppt slides
- Automatic installation from CRAN
- Manual download and installation
- Problems installing or using the package
- Introduction to randomGLM
- Old versions of the R package
- Citing the randomGLM package
The random generalized linear model (RGLM) is a state of the art predictor that shares the advantages of a random forest (excellent predictive accuracy, feature importance measures, out-of-bag estimates of accuracy) with those of a forward selected generalized linear model (interpretability).
The RGLM is a boostrap aggregated (bagged) GLM predictor that incorporates several elements of randomness and instability (random subspace method, optional interaction terms, forward variable selection) and often outperforms alternative
prediction methods as shown in hundreds of genomic data sets, the UCI machine learning benchmark data, and simulations. The RGLM predictor provides variable importance measures that can be used to define a thinned ensemble predictor (involving few features) that retains excellent predictive accuracy.
A Set of tutorials
that illustrate various aspects of randomGLM is available.
Click here to access the tutorial page.
Automatic installation from CRAN
The randomGLM package is available from the Comprehensive R Archive Network (CRAN), the standard repository
for R add-on packages. To install the required packages and randomGLM, simply type
This will install the randomGLM package and all necessary dependencies. The catch is that this only installs
the newest version of randomGLM if your R version is also the newest (minor) version (currently R 2.15.x).
Users using older versions of R will need to follow the manual download and installation instructions below.
But we recommend to use the latest version of R.
The version posted here may be newer than that posted on CRAN: CRAN rules prohibit us from making frequent (weekly) updates of the
package posted to CRAN. Therefore, occasionally the packages posted here may be newer and may have an extra
bugfix that did not make it to CRAN yet.
Note for Mac users:
CRAN may occasionally fail to compile the randomGLM package for
Mac OS X. This leads to the error message “Package randomGLM is not available…” when calling
install.packages(). If this occurs,
please download the binary version from here and follow the installation
instructions (or, if you are able to compile packages locally, download the source and install that).
Note of caution: The newest versions of randomGLM is available from CRAN only for the current R
version. Please update your R to the newest version
or use the manual download below.
Problems installing or using the package? Please see our list of frequently asked
questions. Your problem and the solution may already be posted there.
Manual download and installation
Please follow these steps only if the automatic package installation above does not work.
The current version of the randomGLM package requires R version 2.14 or higher. If you
have an older version of R, please upgrade your R.
The randomGLM package requires the following packages to be installed: MASS, gtools, foreach, doParallel.
If your system does not have them installed, the easiest
way to install them is to issue the following command at the R prompt:
R package download and installation:
Package randomGLM (last updated 2013/05/09) is available here as source code and pre-compiled
versions for Windows and Mac OSX. In general it is preferable to download the source and compile the package
locally; however, if this is not practical, please select an appropriate compiled version.
- Source for Linux and for all users who can compile the package themselves:
- Compiled binaries for R-3.0.0 and higher:
- Compiled binaries for R-2.15:
- A terse changelog
The package version numbers follow the format
packageName_major.minor-revision. Minor versions typically add or change some functionality; revisions typically contain bugfixes or minor enhancements.
Short installation instructions, including other required and recommended packages, are available here.
Should you discover bugs (of which there are most likely plenty), please report them to Peter Langfelder (peter.langfelder at gmail.com) and Steve Horvath.
Problems installing or using the package
Please see our list of Frequently Asked Questions (and frequently given answers);
the solution to your problem may already be posted there. In particular, you can find answers about spurious Mac
errors, compatibility problems when upgrading randomGLM, and others.
If you find a bug in the newest version on CRAN, please see whether this web site has posted a newer
version where the bug may be fixed. If you still cannot solve the problem, email Peter
Langfelder and Steve Horvath.
Getting started with R and the randomGLM package
The package described here is an add-on for the statistical language and environment R (free
Our tutorial, described below, contains step by
Old versions of R package randomGLM
Older version of the packages presented on this page are available here.
Citing the randomGLM package
If you use randomGLM in published work, please cite it as follows:
The method, software and evaluations are described in
- Song L, Langfelder P, Horvath S (2013) Random generalized linear model: a highly accurate and
interpretable ensemble predictor. BMC Bioinformatics 14:5 PMID: 23323760 DOI: 10.1186/1471-2105-14-5.
click here to access the article at the BMC web site)
The original code was written by Lin Song and Steve Horvath. Peter Langfelder is mainly in charge of maintaining and improving the package. The package also builds on functions adapted/adopted from external packages, e.g. the glm function from the stats package and other functions from the MASS package.