Chapter 5 Normalization
The main function is r_norm_dataset
, and it normalizes a dataset based on customizable templates. Depending on the template chosen,
some parameters are used and others not.
5.1 Parameters
Here are the parameters:
param | short desc | class | required | default | description |
---|---|---|---|---|---|
datasetName | object name | character | required | ||
template | template to apply | character | optional | none | one of: “logarithm”, “scaling”, “norm_quantiles”, “combat”, “rnaseqcount”, “unlog_rnaseqcount”, “unlog_metagenomics_2”, “metagenomics_1”, “metagenomics_2”, “microarray”, “compositional” |
log_base | base of the logarithm | character | optional | “2” | can be: ‘2’, ‘e’, ‘10’ |
prior.count | offset of the logarithm | numeric | optional | 0.5 | |
offset | simple offset | numeric | optional | 0.5 | |
filter_type | type of filter on variables | character | optional | ‘absolute’ | can be: ‘max’, ‘absolute’, ‘relative’ |
filter_threshold | threshold for the filters | numeric | optional | 5 | |
normalization | type of normalization | character | optional | CLR | can be: ‘CLR’, ‘ILR’, ‘CSS’ |
scale_reduce | when scaling, should the data be reduced? | boolean | optional | FALSE | |
unlog_base | base of the logarithm when unlogging | character | optional | same as log_base | |
unlog_prior.count | offset when unlogging | numeric | optional | 0.5 | |
datasetName_batches | dataset name in which to find the batch variable | character | optional | NULL | |
varName_batches | name of the batch variable | character | optional | NULL | |
has_log | should cpm+log be applied? | boolean | optional | TRUE | |
has_filter | should a filter be applied | boolean | optional | TRUE | |
has_TMM | should TMM factors be computed? | boolean | optional | TRUE | |
has_TMMwsp | should TMMwsp factors be computed? | boolean | optional | FALSE | |
datasetName_colours | dataset name in which to find the variable to colour plots | character | optional | NULL | |
varName_colours | name of the variable to colour plots | character | optional | NULL |
return a list of two objects (the dataset and its analysis) and three tables (two data views and one string: the name of the data object). When template is ‘none’, objects are not returned.
5.2 Outputs of the function
This data can only be processed with templates: logarithm, combat, rnaseqcount
<- r_norm_dataset(datasetName = "proteins", template = "logarithm", prior.count = 10)
res names(res)
## [1] "Table" "Graphical" "Object"
Name of the produced object:
5.3 Different templates and options
Depending on the nature of the selected dataset, as well as on its attributes logt
,
normalized
and norm_factors
, different templates are proposed, following the logic
in the table below. The column norm_factors
denotes the presence / absence of
normalization factors (that can be saved as attributes in the dataset object). Note that the natures rna-count and metabolite-compo are treated in the same way as metagenomics-count and metagenomics-compo, respectively. The nature general stands for any nature not mentioned here.
nature | logt | normalized | norm_factors | templates |
---|---|---|---|---|
general | no | no | FALSE | logarithm, scaling, norm_quantiles, combat |
general | no | yes | FALSE | logarithm, scaling |
general | yes | no | FALSE | scaling, norm_quantiles, combat |
general | yes | yes | FALSE | scaling |
metagenomics-compo | no | no | FALSE | logarithm, compositional |
metagenomics-compo | no | yes | FALSE | logarithm |
metagenomics-compo | yes | no | FALSE | scaling |
metagenomics-compo | yes | yes | FALSE | scaling |
metagenomics-count | no | yes | TRUE | metagenomics_2 |
metagenomics-count | yes | yes | TRUE | scaling |
metagenomics-count | no | no | FALSE | logarithm, combat, metagenomics_1, metagenomics_2 |
metagenomics-count | no | yes | FALSE | logarithm, metagenomics_2 |
metagenomics-count | yes | no | FALSE | scaling, combat, unlog_metagenomics_2 |
metagenomics-count | yes | yes | FALSE | scaling |
rna-count | no | yes | TRUE | rnaseqcount |
rna-count | yes | yes | TRUE | scaling |
rna-count | no | no | FALSE | logarithm, combat, rnaseqcount |
rna-count | no | yes | FALSE | logarithm, rnaseqcount |
rna-count | yes | no | FALSE | scaling, combat, unlog_rnaseqcount |
rna-count | yes | yes | FALSE | scaling |
microarray | no | no | FALSE | logarithm, scaling, combat, microarray |
microarray | no | yes | FALSE | logarithm, scaling |
microarray | yes | no | FALSE | scaling, combat |
microarray | yes | yes | FALSE | scaling |
metabolite-compo | no | no | FALSE | logarithm, compositional |
metabolite-compo | no | yes | FALSE | logarithm |
metabolite-compo | yes | no | FALSE | scaling |
metabolite-compo | yes | yes | FALSE | scaling |
The table below shows how logt
and normalized
are changed by applying the templates
above, along with some parameter combinations. Note that rnaseqcount
has the same effects
as metagenomics_2
, while unlog_rnaseqcount
has the same effects as unlog_metagenomics_2
.
On top of modifying these attributes, the template metagenomics_1
also modifies the data nature:
it turns from metagenomics-count to metagenomics-compo.
template | has_TMM + has_TMMwsp | has_log | normalization | normalized | logt |
---|---|---|---|---|---|
logarithm | yes | ||||
scaling | |||||
norm_quantiles | yes | ||||
combat | yes | ||||
metagenomics_1 | CLR | yes | yes | ||
metagenomics_1 | CSS | yes | |||
metagenomics_2 | TRUE+FALSE | TRUE | yes | yes | |
metagenomics_2 | FALSE+TRUE | TRUE | yes | yes | |
metagenomics_2 | FALSE+FALSE | TRUE | yes | ||
metagenomics_2 | TRUE+FALSE | FALSE | yes | ||
metagenomics_2 | FALSE+TRUE | FALSE | yes | ||
metagenomics_2 | FALSE+FALSE | FALSE | |||
unlog_metagenomics_2 | TRUE+FALSE | TRUE | yes | ||
unlog_metagenomics_2 | FALSE+TRUE | TRUE | yes | ||
unlog_metagenomics_2 | FALSE+FALSE | TRUE | |||
unlog_metagenomics_2 | TRUE+FALSE | FALSE | yes | no | |
unlog_metagenomics_2 | FALSE+TRUE | FALSE | yes | no | |
unlog_metagenomics_2 | FALSE+FALSE | FALSE | no | ||
rnaseqcount | TRUE+FALSE | TRUE | yes | yes | |
rnaseqcount | FALSE+TRUE | TRUE | yes | yes | |
rnaseqcount | FALSE+FALSE | TRUE | yes | ||
rnaseqcount | TRUE+FALSE | FALSE | yes | ||
rnaseqcount | FALSE+TRUE | FALSE | yes | ||
rnaseqcount | FALSE+FALSE | FALSE | |||
unlog_rnaseqcount | TRUE+FALSE | TRUE | yes | ||
unlog_rnaseqcount | FALSE+TRUE | TRUE | yes | ||
unlog_rnaseqcount | FALSE+FALSE | TRUE | |||
unlog_rnaseqcount | TRUE+FALSE | FALSE | yes | no | |
unlog_rnaseqcount | FALSE+TRUE | FALSE | yes | no | |
unlog_rnaseqcount | FALSE+FALSE | FALSE | no | ||
compositional | yes | yes | |||
microarray | yes | yes |
5.3.1 Necessary parameters for each template
- “logarithm”:
datasetName
,log_base
andprior.count
- “scaling”:
datasetName
andscale_reduce
- “norm_quantiles”:
datasetName
- “combat”:
datasetName
,datasetName_batches
,varName_batches
- “rnaseqcount”:
datasetName
,filter_type
,log_base
,filter_threshold
,prior.count
,has_filter
,has_TMM
,has_TMMwsp
,has_log
- “unlog_rnaseqcount”:
datasetName
,filter_type
,log_base
,unlog_base
,filter_threshold
,prior.count
,unlig_prior.count
,has_filter
,has_TMM
,has_TMMwsp
,has_log
- “unlog_metagenomics_2”:
datasetName
,filter_type
,log_base
,unlog_base
,filter_threshold
,prior.count
,unlig_prior.count
,has_filter
,has_TMM
,has_TMMwsp
,has_log
- “metagenomics_1”:
datasetName
,filter_type
,normalization
,offset
,filter_threshold
,has_filter
- “metagenomics_2”:
datasetName
,filter_type
,log_base
filter_threshold
,prior.count
,has_filter
,has_TMM
,has_TMMwsp
,has_log
- “microarray”:
datasetName
,filter_threshold
- “compositional”:
datasetName
,normalization
datasetName_colours
and varName_colours
are always optional (never required) and not used on the interface (maybe useful for future developments or to remove).
5.3.2 Templates and parameters
Here is another view on the templates and their parameters, giving parameter options and default values on top. Indications in purple refer to template names or parameter names.
Note that the template rnaseqcount is identical to template metagenomics_2.
Note that the template unlog_rnaseqcount is identical to template unlog_metagenomics_2.