Fast functions for correlation and hierarchical clustering

R code examples

Peter (dot) Langfelder (at) gmail (dot) com, SHorvath (at) mednet (dot) ucla (dot) edu

We provide several R scripts comparing the performance of the correlation calculations and hierarchical clustering to the standard R functions. To run these examples, packages flashClust (version 1.20 or higher) and WGCNA (version 1.13 or higher) must be installed. The R code was last updated July 1, 2015, with small updates to both code and text.

We provide an example of a study of module stability analysis using resampling of microarray samples in expression data from livers of female mice of an F2 cross (Ghazalpour et al, 2006). We provide two version of the example. The "large" version uses a full data set of over 23000 probe sets. This version requires a computer with at least 16 GB (32 GB preferred) of RAM to run. For the benefit of users who do not have access to computers with that much memory, we also provide a smaller version of the same analysis that only uses 5000 probes and will run on a standard modern desktop or laptop with at least 2GB of memory.

**Download data and custom function for the analysis.** The following two files are necessary for
either version of the analysis.

- Expression data necessary to run the analysis
- R function file containing functions necessary for this analysis

**R code that performs the large analysis: **
Please choose your preferred format of the actual R code:

**R code that performs the small analysis: **
Please choose your preferred format of the actual R code:

We provide several R scripts that compare correlation calculations implemented in the WGCNA package to standard R function cor.

- Comparison of speed suitable for a standard desktop computer. While this comparison will run on any system, for the main paper we ran it under Windows.
- Comparison of speed and quantification of errors when using a non-zero setting of the argument quick. This script is suitable for a standard desktop computer. While this comparison will run on any system, for the main paper we ran it under Windows.
- Comparison of speed suitable for a large workstation. To run this script, the computer should have at least 16 GB of memory and run a version of R that can use the full system memory (in particular, it must be a 64-bit version).
- Comparison of speed and quantification of errors when using a non-zero setting of the argument quick, a version for a large workstation. Same minimum requirements as above apply.
- Synthesis of timing results - this script puts together the timing results of correlation speed and draws Figure 2 for the main article.

**Update (October 2014):** R core team recently modified the code in the standard function
`hclust`

implemented in package stats. The new "standard" `hclust`

is now as fast or faster than the
`flashClust`

presented here. The R timing code below will work but `flashClust`

will no
longer be much (if at all) faster than the "standard" `hclust`

.