Count data is discrete and skewed and is hence not well
approximated by a normal distribution. Thus, a test based on the negative binomial
distribution, which can reflect these properties, has much higher power to detect
differential expression.
Tests for differential expression between two experimental conditions should take into account both technical and biological variability.
Recently, several authors have claimed that the Poisson distribution
can be used for this purpose. However, tests based on the Poisson assumption (this includes the binomial test and the chi-squared test)
ignore the biological sampling variance, leading to incorrectly optimistic p values. The negative binomial distribution is a
generalisation of the Poisson model that allows to model biological
variance correctly.
In the former two points, DESeq is similar to earlier tools, especially to edgeR.
One of the new features of DESeq is the ability to estimate the variance in a local
fashion, using different coefficients of variation for different expression
strengths. This removes potential selection biases in the hit list of differentially
expressed genes, and gives a more balanced and accurate result.
DESeq's applicability is not limited to RNA-Seq. Rather, it may be used for
many kinds of count data derived from high-throughput experiments.
Download and installation
Automatic installation via Bioconductor
DESeq is available via Bioconductor and can be conveniently installed as follows:
The Bioconductor install script (biocLite) ties the package version to the R version used. Hence, if you use the current release version of R,
you will get the version of DESeq from the current release version of Bioconductor, and if you use the development version of R, you get the development version of DESeq (i.e., the version containing the newest changes, which may be still unstable.)
However, the DESeq works fine with an older R installation, too. You just need to install it manually, as
follows.
Manual installation
Install Bioconductor by following the instructions given
here.
Install the locfit package by typing, in R:
install.packages("locfit")
Download the newest DESeq package from here (bottom of page).
Linux users: Download the “package source”.
Windows users: Download the “Windows binary”.
Mac users: Do not use the “MacOS X binaries” unless you have the appropriate R version (and
then, you should use the automatic installation). Instead, download the “package source”, and, before proceeding, make sure that you have Xcode installed. (Xcode can be found on your second MacOS X installation CD, or here.)
Install the downloaded package into R, either by using the menu function for package installation from
a file, or the R CMD INSTALL mechanism or the install.packages function (see also this chapter of the R
manual.)
Paper and further information
The statistical method behind DESeq is explained in our paper:
Simon Anders and Wolfgang Huber: Differential expression analysis for sequence count data Genome Biology (2010) 11:R106 (open access)
If you use DESeq in scientific work, please cite this paper in your publication.