Defining clusters from a hierarchical cluster tree:


the Dynamic Tree Cut library for R


Peter Langfelder1, Bin Zhang2 and Steve Horvath1,3



1 Dept. of Human Genetics, UC Los Ageles, 2 Rosetta Inpharmatics-Merck Research Laboratories, Seattle, WA, 3 Dept. of Biostatistics, UC Los Ageles

Peter (dot) Langfelder (at) gmail (dot) com, BinZhang (dot) ucla (at) gmail (dot) com, SHorvath (at) mednet (dot) ucla (dot) edu

Abstract

Hierarchical clustering is a widely used method for detecting clusters in genomic data. Clusters are defined by cutting branches off the dendrogram. A common but inflexible method uses a constant height cutoff value; this method exhibits suboptimal performance on complicated dendrograms. We present the Dynamic Tree Cut R library that implements novel dynamic branch cutting methods for detecting clusters in a dendrogram depending on their shape. Compared to the constant height cutoff method, our techniques offer the following advantages: (1) they are capable of identifying nested clusters; (2) they are flexible --- cluster shape parameters can be tuned to suit the application at hand; (3) they are suitable for automation; and (4) they can optionally combine the advantages of hierarchical clustering and partitioning around medoids, giving better detection of outliers. We illustrate the use of these methods by applying them to protein--protein interaction network data and to a simulated gene expression data set.

Link to paper

Langfelder P, Zhang B, Horvath S (2007) Defining clusters from a hierarchical cluster tree: the Dynamic Tree Cut package for R. Bioinformatics 2008 24(5):719-720

Bioinformatics (PDF)

Supplementary material to published paper

A detailed description of the algorithms is provided in this document (pdf format).

This document has been updated since the main article has been published. The version that was published together with the article is available here.

New to R or to Weighted Gene Co-expression Network Analysis?

If you have no previous experience with R or would like to learn about the Weighted Gene Co-expression Network Analysis (WGCNA) framework, you are invited to visit other pages of our lab first. Supplementary material provided at this page by Oldham et al (2006) is a good introduction to R and WGCNA, especially since we adapt that analysis to an example of the methods described here. You may also want to take a look at the main page of WGCNA.

NEW: packages dynamicTreeCut and moduleColor now available from CRAN

The packages listed below are now available directly from CRAN (the Comprehensive R Archive Network). This means that (if you the newest R version) you can install the packages just like any other standard package, using either the command install.packages in R or the corresponding user interface function.

Unfortunately, if your R version is older, the one-step download/installation will not go through (or you may get an older version of the package), and you still have to install the packages using the procedure outlined in the instructions.

R package dynamicTreeCut

The methods described on this web page have been implemented as an R package named dynamicTreeCut. Before using the package, the user is encouraged to read the detailed description linked above to get a sense of the working, advantages and limitations of the techniques.

Download the package dynamicTreeCut_1.62, last updated 2014/06/13:

Short installation instructions are available here. Should you discover bugs, please report them to Peter Langfelder.

The package version numbers follow the format packageName_major.minor-revision. Minor versions typically add or change some functionality; revisions typically contain bugfixes and small additions that do not require any changes in the code using the functions.

R package moduleColor

To be able to run the examples below, you will also need to download and install a companion package moduleColor that contains supporting functions we find useful when working with microarray data.

Download the package moduleColor_1.08 (last updated 2008/08/03, see a terse changelog):

Installation proceeds in the same way as above. Should you discover bugs, please report them to Peter dot Langfelder at gmail.com.

Example 1: Clustering of real data

We provide here an example of cluster detection in real microarray data (click here to download a zip bundle). The bundle contains: As mentioned above, make sure you install the dynamicTreeCut and moduleColor packages before attempting to run this example. The analysis uses several other libraries that need to be installed as well; you will find the list at the start of the ExampleAnalysis.R.

Example 2: A toy example

The second example (click here to download a zip bundle) is a toy example illustrating our algorithms as well as clustering in general. The "data" consist of 16 numbers, 15 of which are chosen to fall into 3 clusters and one is chosen to lie outside of the clusters. The bundle contains the example R script Example-Toy-Posted.R and the file NetworkFunctions-TreeCut-Simulation.R containing supporting functions for the example. The script was used to produce the toy example figure in the detailed description linked above. Again, make sure you install the dynamicTreeCut and moduleColor packages before attempting to run this example. The analysis uses several other libraries that need to be installed as well; you will find the list at the start of the file ExampleAnalysis.R provided above.

Example 3: Simulated data

The third example (click here to download a zip bundle) clusters simulated gene expression-like data to compare our methods to the fixed height tree cut as well as Partitioning Around Medoids. The bundle contains two example scripts named Example-3Clusters.R and Example-10Clusters.R that were used to produce the 3- and 10-cluster examples in the detailed description linked above. The third file, NetworkFunctions-TreeCut-Simulation.R, contains supporting functions for the analysis. Again, make sure you install the dynamicTreeCut and moduleColor packages before attempting to run this example. The analysis uses several other libraries that need to be installed as well; you will find the list at the start of the file ExampleAnalysis.R provided above.

Old versions of R packages

Older version of the packages presented on this page ara available here.




free stats