Description

An automatic cell type detection and assignment algorithm for single cell RNA-Seq (scRNA-seq) and Cytof/FACS data. SCINA is capable of assigning cell type identities to a pool of cells profiled by scRNA-Seq or Cytof/FACS data with prior knowledge of signatures, such as genes and protein symbols that are highly or lowly expressed in each cell type.


Usage

install.packages(“SCINA”)
load(system.file('extdata','example_expmat.RData', package = "SCINA"))
load(system.file('extdata','example_signatures.RData', package = "SCINA"))
exp=exp_test$exp_data
results=SCINA(exp,signatures,max_iter=100,convergence_n=10,convergence_rate=0.99,sensitivity_cutoff=0.9)
plotheat.SCINA(exp,results,signatures)


Arguments

exp A normalized expression matrix. Columns correspond to cells or samples, rows correspond to genes or protein symbols.
signature A list contains multiple signature vectors. Each signature vector represents prior knowledge for one cell type, containing gene names or protein symbols.
max_iter An integer > 0. Default is 100. Max iterations allowed for the EM algorithm.
covergence_n An integer > 0. Default is 10. Stop the EM algorithm if during the last n rounds of iterations, cell type assignment keeps steady above the convergence_rate.
convergence_rate A float between 0 and 1. Default is 0.99. Percentage of cells for which the type assignment remains stable for the last n rounds.
sensitivity_cutoff A float between 0 and 1. Default is 1. The cutoff to remove signatures whose cells types are deemed as non-existent at all in the data by the SCINA algorithm.
rm_overlap A binary value, default 1 (TRUE), denotes that shared symbols between signature lists will be removed. If 0 (FALSE) then allows different cell types to share the same signatures.
allow_unknown A binary value, default 1 (TRUE). If 0 (FALSE) then no cell will be assigned to the 'unknown' category.
results An output object returned from SCINA.

Details

For efficiency of data transfer and computation, the user is encouraged to upload the subset of the gene expression matrix that contains only the genes that appeared in the signature list.

For any signature symbols, if the category is identified with symbol X's low expression level, please specify the symbol as 'low_X'.

Details for 'low_X' (take scRNA-Seqs as an example):

  • There are 4 cell types, the first one highly express one gene A, and the other three lowly express the same gene. Then it is better to specify A as the high marker for cell type 1, but it is not a good idea to specify A as the low expression marker for cell type 2,3,4.
  • There are 4 cell types, the first one lowly express one gene A, and the other three highly express the same gene. Then is it better to specify A as the low marker for cell type 1, but it is not a good idea to specify A as the high expression marker for cell type 2,3,4.
  • There are 4 cell types, the first one lowly express one gene A, the second and third one moderately express gene A, and the last one highly express gene A. Then is it better to specify A as the low marker for cell type 1, and as the high expression marker for cell type 4.
  • The same specification can be applied to protein markers in CyTOF data.

Small sensitivity_cutoff leads to more signatures to be removed, and 1 denotes that no signature is removed.