Chapter 5 Normalization

The main function is r_norm_dataset, and it normalizes a dataset based on customizable templates. Depending on the template chosen, some parameters are used and others not.

5.1 Parameters

Here are the parameters:

param short desc class required default description
datasetName object name character required
template template to apply character optional none one of: “logarithm”, “scaling”, “norm_quantiles”, “combat”, “rnaseqcount”, “unlog_rnaseqcount”, “unlog_metagenomics_2”, “metagenomics_1”, “metagenomics_2”, “microarray”, “compositional”
log_base base of the logarithm character optional “2” can be: ‘2’, ‘e’, ‘10’
prior.count offset of the logarithm numeric optional 0.5
offset simple offset numeric optional 0.5
filter_type type of filter on variables character optional ‘absolute’ can be: ‘max’, ‘absolute’, ‘relative’
filter_threshold threshold for the filters numeric optional 5
normalization type of normalization character optional CLR can be: ‘CLR’, ‘ILR’, ‘CSS’
scale_reduce when scaling, should the data be reduced? boolean optional FALSE
unlog_base base of the logarithm when unlogging character optional same as log_base
unlog_prior.count offset when unlogging numeric optional 0.5
datasetName_batches dataset name in which to find the batch variable character optional NULL
varName_batches name of the batch variable character optional NULL
has_log should cpm+log be applied? boolean optional TRUE
has_filter should a filter be applied boolean optional TRUE
has_TMM should TMM factors be computed? boolean optional TRUE
has_TMMwsp should TMMwsp factors be computed? boolean optional FALSE
datasetName_colours dataset name in which to find the variable to colour plots character optional NULL
varName_colours name of the variable to colour plots character optional NULL

return a list of two objects (the dataset and its analysis) and three tables (two data views and one string: the name of the data object). When template is ‘none’, objects are not returned.

5.2 Outputs of the function

This data can only be processed with templates: logarithm, combat, rnaseqcount

res <- r_norm_dataset(datasetName = "proteins", template = "logarithm", prior.count = 10)
names(res)
## [1] "Table"     "Graphical" "Object"

Name of the produced object:

5.2.1 Table

The Table component only contains the name of the produced object.

res$Table
## $ObjectName
## [1] "normalized_1"

5.2.2 Graphs

2 graphs are produced (here boxplots before and after the normalization).

For all templates, always 2 graphs are produced.

names(res$Graphical)
## [1] "PlotBefore" "PlotAfter"
res$Graphical$PlotBefore
res$Graphical$PlotAfter

5.3 Different templates and options

Depending on the nature of the selected dataset, as well as on its attributes logt, normalized and norm_factors, different templates are proposed, following the logic in the table below. The column norm_factors denotes the presence / absence of normalization factors (that can be saved as attributes in the dataset object). Note that the natures rna-count and metabolite-compo are treated in the same way as metagenomics-count and metagenomics-compo, respectively. The nature general stands for any nature not mentioned here.

nature logt normalized norm_factors templates
general no no FALSE logarithm, scaling,
norm_quantiles, combat
general no yes FALSE logarithm, scaling
general yes no FALSE scaling, norm_quantiles, combat
general yes yes FALSE scaling
metagenomics-compo no no FALSE logarithm, compositional
metagenomics-compo no yes FALSE logarithm
metagenomics-compo yes no FALSE scaling
metagenomics-compo yes yes FALSE scaling
metagenomics-count no yes TRUE metagenomics_2
metagenomics-count yes yes TRUE scaling
metagenomics-count no no FALSE logarithm, combat,
metagenomics_1, metagenomics_2
metagenomics-count no yes FALSE logarithm, metagenomics_2
metagenomics-count yes no FALSE scaling, combat, unlog_metagenomics_2
metagenomics-count yes yes FALSE scaling
rna-count no yes TRUE rnaseqcount
rna-count yes yes TRUE scaling
rna-count no no FALSE logarithm, combat,
rnaseqcount
rna-count no yes FALSE logarithm, rnaseqcount
rna-count yes no FALSE scaling, combat, unlog_rnaseqcount
rna-count yes yes FALSE scaling
microarray no no FALSE logarithm, scaling, combat, microarray
microarray no yes FALSE logarithm, scaling
microarray yes no FALSE scaling, combat
microarray yes yes FALSE scaling
metabolite-compo no no FALSE logarithm, compositional
metabolite-compo no yes FALSE logarithm
metabolite-compo yes no FALSE scaling
metabolite-compo yes yes FALSE scaling

The table below shows how logt and normalized are changed by applying the templates above, along with some parameter combinations. Note that rnaseqcount has the same effects as metagenomics_2, while unlog_rnaseqcount has the same effects as unlog_metagenomics_2. On top of modifying these attributes, the template metagenomics_1 also modifies the data nature: it turns from metagenomics-count to metagenomics-compo.

template has_TMM + has_TMMwsp has_log normalization normalized logt
logarithm yes
scaling
norm_quantiles yes
combat yes
metagenomics_1 CLR yes yes
metagenomics_1 CSS yes
metagenomics_2 TRUE+FALSE TRUE yes yes
metagenomics_2 FALSE+TRUE TRUE yes yes
metagenomics_2 FALSE+FALSE TRUE yes
metagenomics_2 TRUE+FALSE FALSE yes
metagenomics_2 FALSE+TRUE FALSE yes
metagenomics_2 FALSE+FALSE FALSE
unlog_metagenomics_2 TRUE+FALSE TRUE yes
unlog_metagenomics_2 FALSE+TRUE TRUE yes
unlog_metagenomics_2 FALSE+FALSE TRUE
unlog_metagenomics_2 TRUE+FALSE FALSE yes no
unlog_metagenomics_2 FALSE+TRUE FALSE yes no
unlog_metagenomics_2 FALSE+FALSE FALSE no
rnaseqcount TRUE+FALSE TRUE yes yes
rnaseqcount FALSE+TRUE TRUE yes yes
rnaseqcount FALSE+FALSE TRUE yes
rnaseqcount TRUE+FALSE FALSE yes
rnaseqcount FALSE+TRUE FALSE yes
rnaseqcount FALSE+FALSE FALSE
unlog_rnaseqcount TRUE+FALSE TRUE yes
unlog_rnaseqcount FALSE+TRUE TRUE yes
unlog_rnaseqcount FALSE+FALSE TRUE
unlog_rnaseqcount TRUE+FALSE FALSE yes no
unlog_rnaseqcount FALSE+TRUE FALSE yes no
unlog_rnaseqcount FALSE+FALSE FALSE no
compositional yes yes
microarray yes yes

5.3.1 Necessary parameters for each template

  • “logarithm”: datasetName, log_base and prior.count
  • “scaling”: datasetName and scale_reduce
  • “norm_quantiles”: datasetName
  • “combat”: datasetName, datasetName_batches, varName_batches
  • “rnaseqcount”: datasetName, filter_type, log_base, filter_threshold, prior.count, has_filter, has_TMM, has_TMMwsp, has_log
  • “unlog_rnaseqcount”: datasetName, filter_type, log_base, unlog_base, filter_threshold, prior.count, unlig_prior.count, has_filter, has_TMM, has_TMMwsp, has_log
  • “unlog_metagenomics_2”: datasetName, filter_type, log_base, unlog_base, filter_threshold, prior.count, unlig_prior.count, has_filter, has_TMM, has_TMMwsp, has_log
  • “metagenomics_1”: datasetName, filter_type, normalization, offset, filter_threshold, has_filter
  • “metagenomics_2”: datasetName, filter_type, log_base filter_threshold, prior.count, has_filter, has_TMM, has_TMMwsp, has_log
  • “microarray”: datasetName, filter_threshold
  • “compositional”: datasetName, normalization

datasetName_colours and varName_colours are always optional (never required) and not used on the interface (maybe useful for future developments or to remove).

5.3.2 Templates and parameters

Here is another view on the templates and their parameters, giving parameter options and default values on top. Indications in purple refer to template names or parameter names.



Note that the template rnaseqcount is identical to template metagenomics_2.


Note that the template unlog_rnaseqcount is identical to template unlog_metagenomics_2.