Testing for differential expression

DGEclust treats differential expression as a particular clustering configuration of the data. In particular, if the LFCs between two different treatments for a particular gene belong to two different clusters, then this gene is differentially expressed between these two treatments. DGEclust provides a compare_groups function for calculating the posterior probability that the LFCs between any two treatments for a particular gene belong to the same cluster, i.e. the posterior probability that the gene is not differentially expressed. Having processed the data as explained in the previous sections, this can be achieved as follows:

1 from dgeclust import compare_groups
2 res, nsamples = compare_groups(data, mdl, group1='treated', group2='untreated')

The above statement checks each gene for differential expression between the treated and untreated groups of samples by post-processing the output of the Gibbs sampler (encapsulated by the mdl object), using a default burn-in period of 5K iterations. The burn-in period can be modified using the t0 argument, while the last sample to be processed is identified by the tend argument (default value: 10000). Every n-th sample can be post-processed by modifying the dt argument (default value: 1), while the number of cores used is controlled by the nthreads argument (default: all available cores are used). The method returns a list raw and adjusted (FDR) posterior probabilities of no differential expression for each gene (res) and the number of samples that were processed (nsamples):

 1 res.head()
 2 
 3                Posteriors   FDR
 4 FBgn0000008    0.997001     0.834967
 5 FBgn0000017    0.934213     0.580633
 6 FBgn0000018    0.996401     0.830695
 7 FBgn0000024    0.936813     0.586494
 8 FBgn0000032    1.000000     0.880004
 9 FBgn0000037    0.840232     0.421552
10 FBgn0000042    0.967606     0.669613
11 FBgn0000043    0.130774     0.043212
12 FBgn0000045    0.870626     0.455952
13 FBgn0000046    0.885023     0.482976
14 ...

We can visualise our results using an RA diagram:

1 from dgelust.utils import plot_ra
2 
3 idxs = res.FDR < 0.01    # identify DE genes at 1% FDR
4 plot_ra(data.counts_norm['treated1fb'], data.counts_norm['untreated1fb'], idxs=idxs)

RA plot

Notice that the data object contains a copy of the normalised data as one of its attributes.

The output of the Gibbs sampler can be further used as input to hierarchical clustering algorithms. Learn how!