DGEclust treats differential expression as a particular clustering configuration of the data.
In particular, if the LFCs between two different treatments for a particular gene belong to two
different clusters, then this gene is differentially expressed between these two treatments.
DGEclust provides a compare_groups
function for calculating the posterior probability
that the LFCs between any two treatments for a particular gene belong to the same cluster, i.e.
the posterior probability that the gene is not differentially expressed. Having processed the
data as explained in the previous sections, this can be achieved as follows:
1 from dgeclust import compare_groups
2 res, nsamples = compare_groups(data, mdl, group1='treated', group2='untreated')
The above statement checks each gene for differential expression between the treated
and untreated
groups of samples by post-processing the output of the Gibbs sampler (encapsulated by the mdl
object),
using a default burn-in period of 5K iterations. The burn-in period can be modified using the t0
argument,
while the last sample to be processed is identified by the tend
argument (default value: 10000). Every
n-th sample can be post-processed by modifying the dt
argument (default value: 1), while the number of cores
used is controlled by the nthreads
argument (default: all available cores are used). The method returns a
list raw and adjusted (FDR) posterior probabilities of no differential expression for each gene (res
) and
the number of samples that were processed (nsamples
):
1 res.head()
2
3 Posteriors FDR
4 FBgn0000008 0.997001 0.834967
5 FBgn0000017 0.934213 0.580633
6 FBgn0000018 0.996401 0.830695
7 FBgn0000024 0.936813 0.586494
8 FBgn0000032 1.000000 0.880004
9 FBgn0000037 0.840232 0.421552
10 FBgn0000042 0.967606 0.669613
11 FBgn0000043 0.130774 0.043212
12 FBgn0000045 0.870626 0.455952
13 FBgn0000046 0.885023 0.482976
14 ...
We can visualise our results using an RA diagram:
1 from dgelust.utils import plot_ra
2
3 idxs = res.FDR < 0.01 # identify DE genes at 1% FDR
4 plot_ra(data.counts_norm['treated1fb'], data.counts_norm['untreated1fb'], idxs=idxs)
Notice that the data
object contains a copy of the normalised data as one of its attributes.
The output of the Gibbs sampler can be further used as input to hierarchical clustering algorithms. Learn how!