Hierarchical clustering is a widely used method for detecting
clusters in genomic data. Clusters are defined by cutting branches
off the dendrogram. A common but inflexible method uses a constant
height cutoff value; this method exhibits suboptimal performance on
complicated dendrograms. We present the Dynamic Tree Cut R library
that implements novel dynamic branch cutting methods for detecting
clusters in a dendrogram depending on their shape. Compared to the
constant height cutoff method, our techniques offer the following
advantages: (1) they are capable of identifying nested clusters; (2)
they are flexible --- cluster shape parameters can be tuned to suit
the application at hand; (3) they are suitable for automation; and (4)
they can optionally combine the advantages of hierarchical
clustering and partitioning around medoids, giving better detection
of outliers. We illustrate the use of these methods by applying them
to protein--protein interaction network data and to a simulated gene
expression data set.
Link to paper
Langfelder P, Zhang B, Horvath S (2007) Defining
clusters from a hierarchical cluster tree: the Dynamic Tree Cut package for R.
Bioinformatics 2008 24(5):719-720
A detailed description of the algorithms is provided in this
document (pdf format).
This document has been updated since the main article has been published. The
version that was published together with the article is available
here.
New to R or to Weighted Gene Co-expression Network Analysis?
If you have no previous experience with R
or would like to learn about the Weighted Gene Co-expression Network Analysis (WGCNA) framework,
you are invited to visit
other pages of our lab first. Supplementary material provided at this page
by Oldham et al (2006) is a good introduction to R and WGCNA, especially since we adapt that
analysis to an example of the methods described here. You may also want to take a look at the
main page of WGCNA.
NEW: packages dynamicTreeCut and
moduleColor now available from CRAN
The packages listed below are now available directly from CRAN (the Comprehensive R Archive Network). This
means that (if you the newest R version)
you can install the packages just like any other standard package, using either the
command install.packages in R or the corresponding user interface function.
Unfortunately, if your R version is older, the one-step download/installation
will not go through (or you may get an older version of the package), and
you still have to install the packages using the procedure outlined in the
instructions.
R package dynamicTreeCut
The methods described on this web page have been implemented as an
R package named dynamicTreeCut.
Before using the package, the user is encouraged to
read the detailed description linked above to get a sense of the working, advantages and limitations of the
techniques.
Download the package dynamicTreeCut_1.62, last updated 2014/06/13:
A terse Changelog that summarizes main changes between versions.
Short installation instructions are available here.
Should you discover bugs, please report them to Peter Langfelder.
The package version numbers follow the format
packageName_major.minor-revision. Minor versions typically add or change some functionality;
revisions typically contain bugfixes and small additions that do not require any changes in the code
using the functions.
R package moduleColor
To be able to run the examples below, you will also need to download and install a companion package
moduleColor that contains supporting functions we find useful when working with microarray
data.
Download the package moduleColor_1.08 (last updated 2008/08/03, see
a terse changelog):
A normalized and filtered microarray dataset of human and chimpanzee gene expression data from brain
samples. The data come from an earlier analysis
by Oldham et al (2006).
The example R script in the file ExampleAnalysis.R. Copy and paste it into an R
session to get a sense of how to use the method as well as the supporting functions.
As mentioned above, make sure you install the dynamicTreeCut and moduleColor packages
before attempting to run this example. The analysis uses several other libraries that need to be
installed as well; you will find the list at the start of the ExampleAnalysis.R.
Example 2: A toy example
The second example (click here to download a zip bundle)
is a toy example illustrating our algorithms as well as clustering in general. The "data" consist of
16 numbers, 15 of which are chosen to fall into 3 clusters and one is chosen to lie outside of the
clusters. The bundle contains the example R script Example-Toy-Posted.R and the file
NetworkFunctions-TreeCut-Simulation.R containing supporting functions for the example. The script was
used to produce the toy example figure in the detailed description linked above.
Again, make sure you install the dynamicTreeCut and moduleColor packages
before attempting to run this example. The analysis uses several other libraries that need to be
installed as well; you will find the list at the start of the file ExampleAnalysis.R provided above.
Example 3: Simulated data
The third example (click here to download a zip bundle)
clusters simulated gene expression-like data to compare our methods to the fixed
height tree cut as well as Partitioning Around Medoids. The bundle contains two example scripts
named Example-3Clusters.R and Example-10Clusters.R that were used to produce the 3- and 10-cluster
examples in the detailed description linked above. The third file,
NetworkFunctions-TreeCut-Simulation.R, contains supporting functions for the analysis.
Again, make sure you install the dynamicTreeCut and moduleColor packages
before attempting to run this example. The analysis uses several other libraries that need to be
installed as well; you will find the list at the start of the file ExampleAnalysis.R provided above.
Old versions of R packages
Older version of the packages presented on this page ara available here.