Chapter 7 Variates: r_univariate, r_bivariate, r_multivariate, r_univariate_dataset

proteins <- read.table("../forge/backend/R/data/protein.csv", sep = " ", 
                       quote = '\"', dec = ".", row.names = 1)
clinical <- read.table("../forge/backend/R/data/clinical.csv", sep = ",", 
                       quote = '\"', dec = ".", row.names = 1)

7.1 Univariate

Analysis on one variable from one dataset.

The function r_univariate takes two inputs:

  • dataset: the name of the dataset, as character. Can also be directly the name of the dataset, whithout the " for R internal use.
  • varname: the name of the variable as character.

According to the type of the variable (factor, i.e. not numerical, or numerical) the outputs are different.

7.1.1 Factor variables

Example for a factor variable:

out <- r_univariate("clinical", "patient.clinical_cqcf.consent_or_death_status")
names(out)
## [1] "Table"     "Graphical"

The Table component of output in JSON as passed to the interface:

out_TableOnly <- out
out_TableOnly$Graphical <- NULL
jsonview::json_tree_view(out_TableOnly, scroll = TRUE)

The Graphical component:

out$Graphical
## $Barplot

7.1.2 Numerical variables

Example for a numerical variable:

# examples univariate
out <- r_univariate(clinical, "patient.samples.sample.2.portions.portion.analytes.analyte.aliquots.aliquot.quantity")
names(out)
## [1] "Table"     "Graphical"

The Table component of output in json as passed to the interface:

out_TableOnly <- out
out_TableOnly$Graphical <- NULL
jsonview::json_tree_view(out_TableOnly, scroll = TRUE)

The Graphical component:

names(out$Graphical)
## [1] "Boxplot"    "Histogram"  "Density"    "Violin"     "Stripchart"
out$Graphical$Boxplot
out$Graphical$Histogram

out$Graphical$Density

out$Graphical$Violin

out$Graphical$Stripchart

7.2 Bivariate

Cross-analysis on two variables, from one or two datasets.

The function r_bivariate takes four inputs:

  • dataset1: the name of the first dataset, as character. Can also be directly the name of the dataset, whithout the " for R internal use.
  • varname1: the name of the first variable as character.
  • dataset2: the name of the second dataset, as character. Can also be directly the name of the dataset, whithout the " for R internal use.
  • varname2: the name of the second variable as character.

According to the types of the variables (factor, i.e. not numerical, or numerical) the outputs are different.

7.2.1 Factor - factor

Example in the case of two factor variables:

out <- r_bivariate("clinical", "patient.clinical_cqcf.consent_or_death_status", "clinical", "patient.gender")
names(out)
## [1] "Table"     "Graphical"

The Table component of output in json as passed to the interface:

out_TableOnly <- out
out_TableOnly$Graphical <- NULL
jsonview::json_tree_view(out_TableOnly, scroll = TRUE)

The Graphical component:

names(out$Graphical)
## [1] "Barplot_base"  "Barplot_fill"  "Barplot_dodge"
out$Graphical$Barplot_base
out$Graphical$Barplot_fill
out$Graphical$Barplot_dodge
out <- r_bivariate("clinical", "patient.samples.sample.portions.portion.analytes.analyte.3.analyte_type", "clinical", "patient.gender")
names(out)
## [1] "Table"     "Graphical"

The Table component of output in JSON as passed to the interface:

out_TableOnly <- out
out_TableOnly$Graphical <- NULL
jsonview::json_tree_view(out_TableOnly, scroll = TRUE)

The Graphical component:

names(out$Graphical)
## [1] "Barplot_base"  "Barplot_fill"  "Barplot_dodge"
out$Graphical$Barplot_base
out$Graphical$Barplot_fill
out$Graphical$Barplot_dodge

7.2.2 Numeric - factor

Example in the case of one numeric and one factor variable:

out <- r_bivariate("clinical", "patient.samples.sample.2.portions.portion.analytes.analyte.aliquots.aliquot.quantity", "clinical", "patient.gender")
names(out)
## [1] "Table"     "Graphical"

The Table component of output in json as passed to the interface:

out_TableOnly <- out
out_TableOnly$Graphical <- NULL
jsonview::json_tree_view(out_TableOnly, scroll = TRUE)

The Graphical component:

names(out$Graphical)
## [1] "Stripchart" "Boxplot"    "Violin"     "Density"
out$Graphical$Stripchart
out$Graphical$Boxplot
out$Graphical$Violin
out$Graphical$Density

2nd example of one numeric and one factor variable:

out <- r_bivariate("clinical", "patient.samples.sample.2.portions.portion.analytes.analyte.aliquots.aliquot.quantity", "clinical", "patient.clinical_cqcf.consent_or_death_status")
names(out)
## [1] "Table"     "Graphical"

The Table component of output in json as passed to the interface:

out_TableOnly <- out
out_TableOnly$Graphical <- NULL
jsonview::json_tree_view(out_TableOnly, scroll = TRUE)

The Graphical component:

names(out$Graphical)
## [1] "Stripchart" "Boxplot"    "Violin"     "Density"    "TukeyPlot"
out$Graphical$Stripchart
out$Graphical$Boxplot
out$Graphical$Violin
out$Graphical$Density
out$Graphical$TukeyPlot

7.2.3 Numeric - numeric

Example in the case of two numeric variables:

out <- r_bivariate("clinical",
                   "patient.samples.sample.2.portions.portion.analytes.analyte.aliquots.aliquot.quantity", 
                   "clinical", "patient.day_of_form_completion")
names(out)
## [1] "Table"     "Graphical"

The Table component of output in json as passed to the interface:

out_TableOnly <- out
out_TableOnly$Graphical <- NULL
jsonview::json_tree_view(out_TableOnly, scroll = TRUE)

The Graphical component:

names(out$Graphical)
## [1] "Scatterplot"
out$Graphical$Scatterplot

7.3 Multivariate Dotplot

Here the output is a graph.

The function r_multivariate_dotplot takes up to 10 arguments.

Four are mandatory: - datasetxaxis, a character, the name of the dataset for the x-axis variable - varxaxis, a character, the name of the variable for the x-axis - datasetyaxis, a character, the name of the dataset for the y-axis variable - varyaxis, a character, the name of the variable for the y-axis

Six are optional: - datasetcolor, a character, the name of the dataset for the colour of points - varcolor, a character, the name of the variable for the colour of points - datasetshape, a character, the name of the dataset for the shape of points - varshape, a character, the name of the variable for the shape of points - datasetsize, a character, the name of the dataset for the size of points - varsize, a character, the name of the variable for the size of points

All the variables can be either numerical or categorical.

Example:

out <- r_multivariate_dotplot(datasetxaxis = "proteins", varxaxis = "AR",
                    datasetyaxis = "proteins", varyaxis = "Akt",
                    datasetcolor = "proteins", varcolor = "C.Raf",
                    datasetshape = "clinical", varshape = "patient.gender",
                    datasetsize = "proteins", varsize = "Bak")
names(out)
## [1] "Graphical"
names(out$Graphical)
## [1] "Dotplot"
out$Graphical$Dotplot

7.4 Univariate on a dataset

The function r_univariate_dataset performs univariate analysis on all variables of a dataset.

It handles separately numerical and categorical variables.

The function takes two arguments: - datasetName: the name of the dataset, - scale: a boolean, default to FALSE. Should the numerical variables be scaled for plot ?

The function returns an object in the global environement (Object component). It return at least one table (up to 3) and one plot (up to 2), in plotly.

If there is too many variables, the plots takes only the first ones (first 150 for numerical and first 50 for categorical) Example:

out <- r_univariate_dataset(datasetName = "clinical",
                            scale = TRUE)
names(out)
## [1] "Graphical" "Table"     "Object"
names(out$Graphical)
## [1] "plotNum"   "plotCateg"
names(out$Table)
## [1] "numSummary"   "numNormTests" "catSummary"

The Table component of output in json as passed to the interface:

out_TableOnly <- out
out_TableOnly$Graphical <- NULL
out_TableOnly$Object <- NULL
jsonview::json_tree_view(out_TableOnly, scroll = TRUE)

Content of the “Graphical” component:

names(out$Graphical)
## [1] "plotNum"   "plotCateg"
out$Graphical$plotNum
out$Graphical$plotCateg
rm(list=ls())