Chapter 2 Formats for the interface

This chapter aims at showing and documenting the different types of outputs obtained from the different functions to print something on the interface.

This document starts with the initialization of a few variables used thorough this document to illustrate formats:

proteins <- read.table("../forge/backend/R/data/protein.csv", sep = " ", 
                       quote = '\"', dec = ".", row.names = 1)
clinical <- read.table("../forge/backend/R/data/clinical.csv", sep = ",", 
                       quote = '\"', dec = ".", row.names = 1)

2.1 Global output format from R to the interface

The elements returned from R to the interface will always be in JSON format (function r_tojson).

They always have the same structure:

  • on the R side a list with one or several sub-lists called Table, Graphical, or Object.

Example:

OutputTest <- list(
    "Table" = list("Table1" =  NA), 
    "Graphical" = list("Emptyplot" = NA))
jsonview::json_tree_view(OutputTest, height = "250px")

Note that the r_tojson function won’t be called directly by the interface, but is called indirectly by r_wrapp.

The Table component of the list must contain all the elements printed on screen containing text or numerical values. The Graphical component contains all the graphs outputs.

The output of an R function can contain another component. The component Object is used to pass objects to the environment, for R-side internal use only. See Section Workspace chapter for more info.

2.2 General information on datasets

General information on the dataset is provided under the JSON type called GeneralInfo. It is called for example by the function r_summary_description.

It is used to display informations on datasets like this:

And the output format is this one:

jsonview::json_tree_view(out$Table$GeneralInformation, scroll = TRUE)

Here is the R code used by the function r_create_datadesc to produce this output:

dataset_general_info <- data.frame("nrow" = nrow(dataset), 
                                   "ncol" = ncol(dataset), 
                                   "nbmissing" = nb_missings, 
                                   "propmissing" = prop_missing,
                                   "nbnum" = nb_numerical, 
                                   "nbcat" = nb_factors,
                                   "nblogic" = nb_logical, 
                                   "nbothers" = nb_others)
fields <- data.frame("field" = c("nrow", "ncol", "nbmissing", "propmissing",
                                 "nbnum", "nbcat", "nblogic", "nbothers"),
                     "label" = c("# Samples (rows)",
                                 "# Variables (columns)",
                                 "# missing values",
                                 "Proportion of missing values",
                                 "numeric variables",
                                 "categorical variables", 
                                 "logic variables", 
                                 "variables with other types"),
                     "labelShort" = c("# rows", "# col.", "# missing",
                                      "% missing", "# numeric", "# cat.",
                                      "# logic", "# others"),
                     "type" = rep("numeric", 8))
dataset_general_info <- list("type" = "GeneralInfo",
                             "title" = "General information on dataset",
                             "data" = dataset_general_info,
                             "fields" = fields)

2.3 DataView: visualisation of datasets

Dataset visualisation is used to return the dataset itself, for the selected rows and columns.

It is included in two functions, r_import (with the mode preview = TRUE) and r_summary_description (that also returns information on the data at the dataset and variable levels).

Here is an example of the produced output:

The output format is this one:

jsonview::json_tree_view(out$Table$DataView, scroll = TRUE)

Here is the R code used to produce this output, from the fonction r_create_dataview:

df_preview <- dataset[rstart:rend, cstart:cend, drop = FALSE]
colnames(df_preview) <- paste0("var", cstart:cend)

# Variable description
df_var_summary_all <- sapply(dataset, r_describe_variable)
df_var_summary <- t(df_var_summary_all[ ,cstart:cend, drop = FALSE])
rownames(df_var_summary) <- NULL
fields <- data.frame("id" = paste0("var", cstart:cend),
                     "label" = colnames(dataset)[cstart:cend],
                     "class" = df_var_summary[, 1],
                     "nbmissing" = as.numeric(df_var_summary[, 2]),
                     "propmissing" = as.numeric(df_var_summary[, 3]))

df_preview <- list("type" = "DataView",
                   "title" = "View of an extract of dataset",
                   "data" = df_preview,
                   "fields" = fields)

2.4 Graph formats

Several graph formats are accepted as inputs for r_tojson: - plotly: for interactive graphs - ggplot: for static graphs converted to interactive graphs by ggplotly in r_tojson - png: for static graphs - Venn: for Venn diagram - Upset: for upset plots (upset.js library)

Finally, r_tojson outputs 4 different plot types: plotly, png, Venn, and UpsetJS.

The objects in output$Graphical will be passed to the function r_tojson_graph() by r_tojson(). This function converts the graphics to pass them to the interface in JSON.

On the stats/R side, the Graphical component of the output is supposed to be like this:

Graphical <- list("nameplot1" = plot1, "nameplot2" = plot2)

See the dedicated sub-sections for more details.

2.4.1 PlotLy

Here the plots can directly be a plot produced by ggplot or by plotly or by ggplotly.

No additional info or attributed need to be passed.

For example the code of the r_univariate function is:

##### Graphical outputs #####
Plot <- ggplot(df.variable) + theme_bw() + ggtitle(varname)
Barplot <- Plot + geom_bar(aes(x = variable)) +
theme(axis.title.x = element_blank())

Output <- list(
Table = list(Frequency = Frequency),
Graphical = list(Barplot = Barplot)
)

The produced plot is this one:

out_cat <- r_univariate("clinical", "patient.gender")
plotly::ggplotly(out_cat$Graphical$Barplot)

The graph is passed into json like this:

out_catOnlyGraph <- out_cat
out_catOnlyGraph$Table <- NULL
graphJson <- r_tojson(out_catOnlyGraph)
jsonview::json_tree_view(graphJson, scroll = TRUE)

2.4.2 png

An example of code to produce a graph in a png file can be found in the r_heatmap function:

# Creation of the images subdirectory in the working directory 
tmpdir <- paste0(getwd(), "/images")
if(!dir.exists(tmpdir)) dir.create(tmpdir, recursive = TRUE)

# Creation of the tmp file for the png
tmpfile <- tempfile(pattern="heatmap", fileext = ".png", tmpdir = tmpdir)

# Producing the plot
png(filename = tmpfile, width = 950, height = 950)
  plot(c(1,2,3)) # Fake plot, just for example
dev.off()

# Output construction 
Output <- list("Graphical" = list("ExamplePlot" = list("type" = "png", "path" = tmpfile)))

And the produced result for the interface is this one:

exPng <- r_tojson(Output)
jsonview::json_tree_view(exPng, scroll=TRUE, height = "250px")

2.4.3 Venn

TODO

2.4.4 UpsetJS

TODO

2.5 Table formats

Several cases of tables have been created: - BasicTable: the common case to print tables on the interface - SummaryTable: to print tables including summary on datasets (note: for now the format is the same as BasicTable, excepting the name but it has been created to maybe later handle the NA’s informations differently in this kind of table). - CrossTable: to print contingency tables between two categorical variables.
- Criterion: to print on screen only one value.

2.5.1 BasicTable

The BasicTable type is for example used in the r_univariate function, in the case of a categorical variable (it’s a frequency table in that case).

Here is the R code used in r_univariate to produce a BasicTable:

tabFreq <- table(variable, useNA = "ifany")
dfFrequency <- as.data.frame(tabFreq)
colnames(dfFrequency) <- c("label", "freq")
dfFrequency$percent <- paste0(round(100 * dfFrequency$freq / sum(tabFreq), 2), " %") 
rownames(dfFrequency) <- NULL
descFreq <- data.frame("field" = c("label", "freq", "percent"),
                       "label" = c("Label", "Frequency", "Percentage"),
                       "labelShort" = c("Label", "Frequency", "Percentage"),
                       "type" = c("string", "numeric", "string"))

titleFreq <- paste0("Summary of ", varname, " in ", dataset)
Frequency <- list("type" = "BasicTable",
                  "title" = titleFreq,
                  "data" = dfFrequency,
                  "fields" = descFreq) 

Here is the desired output:

to_show <- out_cat$Table$Frequency$data
colnames(to_show) <- out_cat$Table$Frequency$fields$labelShort
knitr::kable(to_show)
Label Frequency Percentage
female 977 98.79 %
male 12 1.21 %

And the corresponding format output is this one:

jsonview::json_tree_view(out_cat$Table$Frequency, scroll = TRUE)

The labelShort field is optional It is used to print on the interface smaller column titles. The full columns titles (label) is shown when the user hovers on the column title.

2.5.2 SummaryTable

The SummaryTable format is used for example in the r_univariate function, in the case of a numerical variable.

Here is the R code used in r_univariate to produce a BasicTable:

dfSummary <- r_summary_var(variable)
descSummary <- data.frame("field" = c("min",
                                      "quart1",
                                      "median",
                                      "mean",
                                      "quart3",
                                      "max",
                                      "sd",
                                      "nbna",
                                      "nbnona",
                                      "nbunique"),
                          "label" = c("Minimum",
                                      "1st quartile",
                                      "Median",
                                      "Mean",
                                      "3rd quartile",
                                      "Maximum",
                                      "Standard deviation",
                                      "Number missing",
                                      "Number non-missing",
                                      "Number of unique values"),
                          "labelShort" = c("Min.",
                                           "1st Qu.",
                                           "Median",
                                           "Mean",
                                           "3rd Qu.",
                                           "Max.",
                                           "Sd",
                                           "Missing",
                                           "Non-missing",
                                           "Uniques"),
                          "type" = rep("numeric", 10))
rownames(dfSummary) <- NULL
titleSummary <- paste0("Summary of ", varname, " in ", dataset)
Summary <- list("type" = "SummaryTable",
                "title" = titleSummary,
                "data" = dfSummary,
                "fields" = descSummary) 

Here is the desired output:

out_num <- r_univariate("proteins", "ATM")
to_show <- out_num$Table$Summary$data
colnames(to_show) <- out_num$Table$Summary$fields$labelShort
knitr::kable(to_show)
Min. 1st Qu. Median Mean 3rd Qu. Max. Sd Missing Non-missing Uniques
-2.336929 -0.3103791 0.0054033 -0.0233721 0.3274099 1.524847 0.5415193 0 379 379
jsonview::json_tree_view(out_num$Table$Summary, scroll = TRUE)

2.5.3 CrossTable

The CrossTable format is used for example in the `r_bivariate`` function, in the case of two categorical variables.

Here is the code used by the function fct_cross_Table in the r_bivariate function to produce a CrossTable type:

dfTable <- data.frame("label" = rownames(Table), Table)
colnames(dfTable)[2:ncol(dfTable)] <- paste0("label", 1:(ncol(dfTable) - 1))
rownames(dfTable) <- NULL
# Replacing NA's by missing
dfTable$label <- ifelse(dfTable$label == "NA.", "missing", dfTable$label)

descTable <- data.frame("field" = colnames(dfTable),
                        "label" = c("", colnames(Table)),
                        "type" = c("string", rep("numeric", ncol(Table))))
descTable$label <- ifelse(is.na(descTable$label), "missing", descTable$label)

titleTable <- "Cross table of variables"

Cross_table <- list("type" = "CrossTable", 
                    "title" = titleTable, 
                    "rowVariable" = varnames[1],
                    "colVariable" = varnames[2],
                    "data" = dfTable, 
                    "fields" = descTable)

Example of a cross table to show on screen:

variabletest1 <- "patient.gender"
variabletest2 <- "patient.clinical_cqcf.consent_or_death_status"
out_cat_cat <- r_bivariate("clinical", variabletest1, "clinical", variabletest2)
to_show <- out_cat_cat$Table$Cross_table$data
colnames(to_show) <- out_cat_cat$Table$Cross_table$fields$label

sketch = htmltools::withTags(table(
   class = 'display',
   thead(
      tr(
         th(rowspan = 2, variabletest1),
         th(colspan = ncol(to_show)-1, variabletest2)
      ),
      tr(
         lapply(names(to_show)[-1], th)
      )
   )
))

DT::datatable(to_show, container = sketch, rownames = FALSE, 
              options = list("paging" = FALSE, "ordering" = FALSE, 
                             "bInfo" = FALSE, "searching" = FALSE))
jsonview::json_tree_view(out_cat_cat$Table$Cross_table, scroll = TRUE)

2.5.4 Output of a single value

Single values are outputed directly (with no special type). This case is introduced in r_clustering to output a quality criterion (Broken Stick). It is named BS in this case and is made as follows:

out_clustering <- r_clustering("proteins", method = "hac")
out_clustering$Table$BS
## [1] 10
jsonview::json_tree_view(out_clustering$Table$BS, height = "150px")

2.5.5 Test formats

Two formats for tests have been designed: - BasicTest to display a statistical test (or several tests on one table). - PostHocTests to display post-hoc tests. It differs from the BasicTest format since it has a group column.

2.5.5.1 BasicTest

It is used in the following functions: - r_univariate in the case of a numerical variable - r_bivariate

Here is the code to produce a test output, from the r_bivariate function:

dfChisq <- list(
  "statistic" = Result_chisq$statistic,
  "pvalue" = Result_chisq$p.value,
  "conclusion" = chisqConclusion,
  "stars" = r_stars(Result_chisq$p.value),
  "labelTest" = Result_chisq$method,
  "labelStatistic" = "Statistic",
  "labelPvalue" = "P-value")

titleTest <- paste0("Test of the independence between ", varnames[1], " and ", varnames[2])
Chisq_test <- list("type" = "BasicTest", 
                   "title" = titleTest, 
                   "data" = dfChisq)

Here is an example with here only one line but some cases can have several lines (the data component is a data.frame with one or more lines).

data <- out_num$Table$NormalityTest$data
to_show <- data.frame(data)[,c("stars", "conclusion", "statistic", "pvalue")]
to_show$stars <- paste(rep("*", to_show$stars), collapse="")
rownames(to_show) <- NULL
colnames(to_show) <- c("Signif.", data$labelTest, data$labelStatistic, data$labelPvalue)
knitr::kable(to_show)
Signif. Shapiro-Wilk normality test Statistic P-value
*** The distribution of ATM significantly deviates from a normal distribution (risk: 5%). 0.9826694 0.0001627

The first column is passed by the stars field (a number) and transformed by the interface to the number of colored stars to display.

jsonview::json_tree_view(out_cat_cat$Table$Chisq_test, scroll = TRUE)

2.5.5.2 PostHocTests

Post-hoc tests are used by the r_bivariate function in the case of one numerical and one categorical variable, if the main test is positive and if the number of levels of the categorical variable is larger than 2.

An example of code to produce a PostHocTests type can be found in the functions fct_PostHoc_KW or fct_PostHoc_Anova in the r_bivariate.R file.

dfTukey <- data.frame("group" = rownames(res_thsd$Factor),
                    res_thsd$Factor[ ,c("diff", "p adj")],
                    "conclusion" = conclusions,
                    "bconclusion" = bconclusions,
                    "stars" = sapply(res_thsd$Factor[,"p adj"], r_stars))
rownames(dfTukey) <- NULL
colnames(dfTukey) <- c("group", "stat", "pval", "conclusion", "bconclusion", "stars")

descTukey <- data.frame("field" = c("group", "stat", "pval", "conclusion",
                                    "bconclusion", "stars"),
                      "label" = c("Groups", "Difference in means", "P-value", 
                                  "Conclusion", "Boolean conclusion", "Significance"),
                      "labelShort" = c("Groups", "Diff. in means", "P-value", 
                                       "Conclusion", "Boolean", "Signif."),
                      "type" = c("string", "numeric", "numeric", "string", "boolean", "numeric"))
titleTukey <- paste("Tukey Honest Difference in Means (post-hoc ANOVA) tests of",
                  varnames[1], "between pairwise groups of", varnames[2])
Posthoc_ANOVA <- list("type" = "PostHocTests", 
                    "title" = titleTukey, 
                    "data" = dfTukey, 
                    "fields" = descTukey)

Here is an example:

variabletest1 <- "patient.day_of_form_completion"
variabletest2 <- "patient.clinical_cqcf.consent_or_death_status"
out_num_cat <- r_bivariate("clinical", variabletest1, "clinical", variabletest2)
to_show <- out_num_cat$Table$Posthoc_ANOVA$data
to_show$stars <- sapply(to_show$stars, function(x) paste(rep("*", x), collapse=""))
colnames(to_show) <- out_num_cat$Table$Posthoc_ANOVA$fields$labelShort
to_show <- to_show[,-match("Boolean", colnames(to_show))]
knitr::kable(to_show)
Groups Diff. in means P-value Conclusion Signif.
deceased-consented -8.654737 0.0000000 Difference in means of patient.day_of_form_completion is significant between the two groups deceased-consented. ***
waiver-consented 3.503158 0.9172299 Difference in means of patient.day_of_form_completion is not significant between the two groups waiver-consented.
waiver-deceased 12.157895 0.3640209 Difference in means of patient.day_of_form_completion is not significant between the two groups waiver-deceased.

and the corresponding output format:

jsonview::json_tree_view(out_num_cat$Table$Posthoc_ANOVA, scroll = TRUE)