Heterogeneity Dissection
Description
This function performs dissection of bulk sample gene expression using matched normal and tu- morgraft RNA-seq data. It outputs the final proportion estiamtes of the three components for all patients.
The patient-specific dissection proportion estimates are saved in a 3-by-k matrix named "rho", where k is the number of patients. The 3 rows of "rho" matrix correspond to the tumor, normal, stroma components in order. That is, the proportion estimate of tumor component for patient i is stored in rho[1,i]; the normal component proportion estimate of this patient is stored in rho[2,i];and stroma component proportion in rho[3,i].
Usage
DisHet (exp_T, exp_N, exp_G, save=TRUE, MCMC_folder, n_cycle=10000,
save_last=500,mean_last=200, dirichlet_c=1, S_c=1, rho_small=1e-2,
initial_rho_S=0.02, initial_rho_G=0.96, initial_rho_N=0.02)
Arguments
# | Value | Description |
---|---|---|
1 | exp_T | Gene expression in bulk RNA-seq samples. The rows correspond to different genes. The columns correspond to different patients. |
2 | exp_N | Gene expression in the corresponding normal samples. The rows list the same set of genes as in exp_G. The columns correspond to patients matched with exp_T. |
3 | exp_G | Gene expression in the corresponding tumor samples. The rows list the same set of genes as in exp_G. The columns correspond to patients matched with exp_T |
4 | save | When save==TRUE, as in default, all component proportion estimates during MCMC iterations can be saved into a user-specified directory using the "MCMC_folder" argument. |
5 | MCMC_folder | Directory for saving the estimated mixture proportion matrix updates during MCMC iterations. The default setting is to create a "DisHet" folder under the current working directory. |
6 | n_cycle | Number of MCMC iterations(chain length). The default value is 10,000. |
7 | save_last | Save the rho matrix updates for the last "save_last" Number of MCMC iterations. The default value is 500. |
8 | mean_last | Calculate the final proportion estiamte matrix using the last "mean_last" number of MCMC iterations. The default value is 200. |
9 | dirichlet_c | Stride scale in sampling rho. Larger value leads to smaller steps in sampling rho. The default value is 1. |
10 | S_c | Stride scale in sampling Sij. Larger value leads to larger steps in sampling Sij. The default value is 1. |
11 | rho_small | The smallest rho updates allowed during MCMC. The default is 1e-2. This threshold is set to help improve numerical stability of the algorithm. |
12 | initial_rho_S | Initial value of the proportion estimate for the stroma component. The default value is 0.02. |
Details
Un-logged expression values should be used in exp_N/T/G matrices, and their rows and columns must match each other corresponding to the same set of genes and patients.
The values specified for "initial_rho_S", "initial_rho_G", and "initial_rho_S" all have to be positive. If the three proportion initials are not summing to 1, normalization is performed automatically to force the sum to be 1.
Examples
load(system.file("example/example_data.RData",package="DisHet"));
exp_T <- exp_T[1:200,];
exp_N <- exp_N[1:200,];
exp_G <- exp_G[1:200,];
rho <- DisHet(exp_T, exp_N, exp_G, save=FALSE, n_cycle=500, mean_last=50);