Chapter 2 Formats for the interface
This chapter aims at showing and documenting the different types of outputs obtained from the different functions to print something on the interface.
This document starts with the initialization of a few variables used thorough this document to illustrate formats:
<- read.table("../forge/backend/R/data/protein.csv", sep = " ",
proteins quote = '\"', dec = ".", row.names = 1)
<- read.table("../forge/backend/R/data/clinical.csv", sep = ",",
clinical quote = '\"', dec = ".", row.names = 1)
2.1 Global output format from R to the interface
The elements returned from R to the interface will always be in JSON format
(function r_tojson
).
They always have the same structure:
- on the R side a list with one or several sub-lists called
Table
,Graphical
, orObject
.
Example:
<- list(
OutputTest "Table" = list("Table1" = NA),
"Graphical" = list("Emptyplot" = NA))
::json_tree_view(OutputTest, height = "250px") jsonview
Note that the r_tojson
function won’t be called directly by the interface,
but is called indirectly by r_wrapp
.
The Table
component of the list must contain all the elements printed on
screen containing text or numerical values.
The Graphical
component contains all the graphs outputs.
The output of an R function can contain another component. The component
Object
is used to pass objects to the environment, for R-side internal
use only. See Section Workspace chapter for more info.
2.2 General information on datasets
General information on the dataset is provided under the JSON type called
GeneralInfo
. It is called for example by the function r_summary_description
.
It is used to display informations on datasets like this:
And the output format is this one:
::json_tree_view(out$Table$GeneralInformation, scroll = TRUE) jsonview
Here is the R code used by the function r_create_datadesc
to produce this
output:
<- data.frame("nrow" = nrow(dataset),
dataset_general_info "ncol" = ncol(dataset),
"nbmissing" = nb_missings,
"propmissing" = prop_missing,
"nbnum" = nb_numerical,
"nbcat" = nb_factors,
"nblogic" = nb_logical,
"nbothers" = nb_others)
<- data.frame("field" = c("nrow", "ncol", "nbmissing", "propmissing",
fields "nbnum", "nbcat", "nblogic", "nbothers"),
"label" = c("# Samples (rows)",
"# Variables (columns)",
"# missing values",
"Proportion of missing values",
"numeric variables",
"categorical variables",
"logic variables",
"variables with other types"),
"labelShort" = c("# rows", "# col.", "# missing",
"% missing", "# numeric", "# cat.",
"# logic", "# others"),
"type" = rep("numeric", 8))
<- list("type" = "GeneralInfo",
dataset_general_info "title" = "General information on dataset",
"data" = dataset_general_info,
"fields" = fields)
2.3 DataView: visualisation of datasets
Dataset visualisation is used to return the dataset itself, for the selected rows and columns.
It is included in two functions, r_import
(with the mode preview = TRUE
) and
r_summary_description
(that also returns information on the data at the
dataset and variable levels).
Here is an example of the produced output:
The output format is this one:
::json_tree_view(out$Table$DataView, scroll = TRUE) jsonview
Here is the R code used to produce this output, from the fonction
r_create_dataview
:
<- dataset[rstart:rend, cstart:cend, drop = FALSE]
df_preview colnames(df_preview) <- paste0("var", cstart:cend)
# Variable description
<- sapply(dataset, r_describe_variable)
df_var_summary_all <- t(df_var_summary_all[ ,cstart:cend, drop = FALSE])
df_var_summary rownames(df_var_summary) <- NULL
<- data.frame("id" = paste0("var", cstart:cend),
fields "label" = colnames(dataset)[cstart:cend],
"class" = df_var_summary[, 1],
"nbmissing" = as.numeric(df_var_summary[, 2]),
"propmissing" = as.numeric(df_var_summary[, 3]))
<- list("type" = "DataView",
df_preview "title" = "View of an extract of dataset",
"data" = df_preview,
"fields" = fields)
2.4 Graph formats
Several graph formats are accepted as inputs for r_tojson
:
- plotly
: for interactive graphs
- ggplot
: for static graphs converted to interactive graphs by ggplotly
in
r_tojson
- png
: for static graphs
- Venn
: for Venn diagram
- Upset
: for upset plots (upset.js library)
Finally, r_tojson
outputs 4 different plot types: plotly, png, Venn, and
UpsetJS.
The objects in output$Graphical
will be passed to the function
r_tojson_graph()
by r_tojson()
.
This function converts the graphics to pass them to the interface in JSON.
On the stats/R side, the Graphical
component of the output is supposed to
be like this:
<- list("nameplot1" = plot1, "nameplot2" = plot2) Graphical
See the dedicated sub-sections for more details.
2.4.1 PlotLy
Here the plots can directly be a plot produced by ggplot
or by plotly
or by
ggplotly
.
No additional info or attributed need to be passed.
For example the code of the r_univariate
function is:
##### Graphical outputs #####
<- ggplot(df.variable) + theme_bw() + ggtitle(varname)
Plot <- Plot + geom_bar(aes(x = variable)) +
Barplot theme(axis.title.x = element_blank())
<- list(
Output Table = list(Frequency = Frequency),
Graphical = list(Barplot = Barplot)
)
The produced plot is this one:
<- r_univariate("clinical", "patient.gender")
out_cat ::ggplotly(out_cat$Graphical$Barplot) plotly
The graph is passed into json like this:
<- out_cat
out_catOnlyGraph $Table <- NULL
out_catOnlyGraph<- r_tojson(out_catOnlyGraph)
graphJson ::json_tree_view(graphJson, scroll = TRUE) jsonview
2.4.2 png
An example of code to produce a graph in a png file can be found in the
r_heatmap
function:
# Creation of the images subdirectory in the working directory
<- paste0(getwd(), "/images")
tmpdir if(!dir.exists(tmpdir)) dir.create(tmpdir, recursive = TRUE)
# Creation of the tmp file for the png
<- tempfile(pattern="heatmap", fileext = ".png", tmpdir = tmpdir)
tmpfile
# Producing the plot
png(filename = tmpfile, width = 950, height = 950)
plot(c(1,2,3)) # Fake plot, just for example
dev.off()
# Output construction
<- list("Graphical" = list("ExamplePlot" = list("type" = "png", "path" = tmpfile))) Output
And the produced result for the interface is this one:
<- r_tojson(Output)
exPng ::json_tree_view(exPng, scroll=TRUE, height = "250px") jsonview
2.5 Table formats
Several cases of tables have been created:
- BasicTable
: the common case to print tables on the interface
- SummaryTable
: to print tables including summary on datasets (note: for now
the format is the same as BasicTable, excepting the name but it has been
created to maybe later handle the NA’s informations differently in this kind of
table).
- CrossTable
: to print contingency tables between two categorical variables.
- Criterion
: to print on screen only one value.
2.5.1 BasicTable
The BasicTable
type is for example used in the r_univariate
function, in
the case of a categorical variable (it’s a frequency table in that case).
Here is the R
code used in r_univariate
to produce a BasicTable
:
<- table(variable, useNA = "ifany")
tabFreq <- as.data.frame(tabFreq)
dfFrequency colnames(dfFrequency) <- c("label", "freq")
$percent <- paste0(round(100 * dfFrequency$freq / sum(tabFreq), 2), " %")
dfFrequencyrownames(dfFrequency) <- NULL
<- data.frame("field" = c("label", "freq", "percent"),
descFreq "label" = c("Label", "Frequency", "Percentage"),
"labelShort" = c("Label", "Frequency", "Percentage"),
"type" = c("string", "numeric", "string"))
<- paste0("Summary of ", varname, " in ", dataset)
titleFreq <- list("type" = "BasicTable",
Frequency "title" = titleFreq,
"data" = dfFrequency,
"fields" = descFreq)
Here is the desired output:
<- out_cat$Table$Frequency$data
to_show colnames(to_show) <- out_cat$Table$Frequency$fields$labelShort
::kable(to_show) knitr
Label | Frequency | Percentage |
---|---|---|
female | 977 | 98.79 % |
male | 12 | 1.21 % |
And the corresponding format output is this one:
::json_tree_view(out_cat$Table$Frequency, scroll = TRUE) jsonview
The labelShort
field is optional It is used to print on the interface smaller
column titles.
The full columns titles (label
) is shown when the user hovers on the column
title.
2.5.2 SummaryTable
The SummaryTable
format is used for example in the r_univariate
function,
in the case of a numerical variable.
Here is the R code used in r_univariate
to produce a BasicTable
:
<- r_summary_var(variable)
dfSummary <- data.frame("field" = c("min",
descSummary "quart1",
"median",
"mean",
"quart3",
"max",
"sd",
"nbna",
"nbnona",
"nbunique"),
"label" = c("Minimum",
"1st quartile",
"Median",
"Mean",
"3rd quartile",
"Maximum",
"Standard deviation",
"Number missing",
"Number non-missing",
"Number of unique values"),
"labelShort" = c("Min.",
"1st Qu.",
"Median",
"Mean",
"3rd Qu.",
"Max.",
"Sd",
"Missing",
"Non-missing",
"Uniques"),
"type" = rep("numeric", 10))
rownames(dfSummary) <- NULL
<- paste0("Summary of ", varname, " in ", dataset)
titleSummary <- list("type" = "SummaryTable",
Summary "title" = titleSummary,
"data" = dfSummary,
"fields" = descSummary)
Here is the desired output:
<- r_univariate("proteins", "ATM")
out_num <- out_num$Table$Summary$data
to_show colnames(to_show) <- out_num$Table$Summary$fields$labelShort
::kable(to_show) knitr
Min. | 1st Qu. | Median | Mean | 3rd Qu. | Max. | Sd | Missing | Non-missing | Uniques |
---|---|---|---|---|---|---|---|---|---|
-2.336929 | -0.3103791 | 0.0054033 | -0.0233721 | 0.3274099 | 1.524847 | 0.5415193 | 0 | 379 | 379 |
::json_tree_view(out_num$Table$Summary, scroll = TRUE) jsonview
2.5.3 CrossTable
The CrossTable
format is used for example in the `r_bivariate`` function, in
the case of two categorical variables.
Here is the code used by the function fct_cross_Table
in the r_bivariate
function to produce a CrossTable
type:
<- data.frame("label" = rownames(Table), Table)
dfTable colnames(dfTable)[2:ncol(dfTable)] <- paste0("label", 1:(ncol(dfTable) - 1))
rownames(dfTable) <- NULL
# Replacing NA's by missing
$label <- ifelse(dfTable$label == "NA.", "missing", dfTable$label)
dfTable
<- data.frame("field" = colnames(dfTable),
descTable "label" = c("", colnames(Table)),
"type" = c("string", rep("numeric", ncol(Table))))
$label <- ifelse(is.na(descTable$label), "missing", descTable$label)
descTable
<- "Cross table of variables"
titleTable
<- list("type" = "CrossTable",
Cross_table "title" = titleTable,
"rowVariable" = varnames[1],
"colVariable" = varnames[2],
"data" = dfTable,
"fields" = descTable)
Example of a cross table to show on screen:
<- "patient.gender"
variabletest1 <- "patient.clinical_cqcf.consent_or_death_status"
variabletest2 <- r_bivariate("clinical", variabletest1, "clinical", variabletest2)
out_cat_cat <- out_cat_cat$Table$Cross_table$data
to_show colnames(to_show) <- out_cat_cat$Table$Cross_table$fields$label
= htmltools::withTags(table(
sketch class = 'display',
thead(
tr(
th(rowspan = 2, variabletest1),
th(colspan = ncol(to_show)-1, variabletest2)
),tr(
lapply(names(to_show)[-1], th)
)
)
))
::datatable(to_show, container = sketch, rownames = FALSE,
DToptions = list("paging" = FALSE, "ordering" = FALSE,
"bInfo" = FALSE, "searching" = FALSE))
::json_tree_view(out_cat_cat$Table$Cross_table, scroll = TRUE) jsonview
2.5.4 Output of a single value
Single values are outputed directly (with no special type). This case is
introduced in r_clustering
to output a quality criterion (Broken Stick). It
is named BS
in this case and is made as follows:
<- r_clustering("proteins", method = "hac")
out_clustering $Table$BS out_clustering
## [1] 10
::json_tree_view(out_clustering$Table$BS, height = "150px") jsonview
2.5.5 Test formats
Two formats for tests have been designed:
- BasicTest
to display a statistical test (or several tests on one table).
- PostHocTests
to display post-hoc tests. It differs from the BasicTest
format since it has a group
column.
2.5.5.1 BasicTest
It is used in the following functions:
- r_univariate
in the case of a numerical variable
- r_bivariate
Here is the code to produce a test output, from the r_bivariate
function:
<- list(
dfChisq "statistic" = Result_chisq$statistic,
"pvalue" = Result_chisq$p.value,
"conclusion" = chisqConclusion,
"stars" = r_stars(Result_chisq$p.value),
"labelTest" = Result_chisq$method,
"labelStatistic" = "Statistic",
"labelPvalue" = "P-value")
<- paste0("Test of the independence between ", varnames[1], " and ", varnames[2])
titleTest <- list("type" = "BasicTest",
Chisq_test "title" = titleTest,
"data" = dfChisq)
Here is an example with here only one line but some cases can have several
lines (the data
component is a data.frame with one or more lines).
<- out_num$Table$NormalityTest$data
data <- data.frame(data)[,c("stars", "conclusion", "statistic", "pvalue")]
to_show $stars <- paste(rep("*", to_show$stars), collapse="")
to_showrownames(to_show) <- NULL
colnames(to_show) <- c("Signif.", data$labelTest, data$labelStatistic, data$labelPvalue)
::kable(to_show) knitr
Signif. | Shapiro-Wilk normality test | Statistic | P-value |
---|---|---|---|
*** | The distribution of ATM significantly deviates from a normal distribution (risk: 5%). | 0.9826694 | 0.0001627 |
The first column is passed by the stars
field (a number) and transformed by
the interface to the number of colored stars to display.
::json_tree_view(out_cat_cat$Table$Chisq_test, scroll = TRUE) jsonview
2.5.5.2 PostHocTests
Post-hoc tests are used by the r_bivariate
function in the case of one
numerical and one categorical variable, if the main test is positive and if the
number of levels of the categorical variable is larger than 2.
An example of code to produce a PostHocTests
type can be found in the
functions fct_PostHoc_KW
or fct_PostHoc_Anova
in the r_bivariate.R
file.
<- data.frame("group" = rownames(res_thsd$Factor),
dfTukey $Factor[ ,c("diff", "p adj")],
res_thsd"conclusion" = conclusions,
"bconclusion" = bconclusions,
"stars" = sapply(res_thsd$Factor[,"p adj"], r_stars))
rownames(dfTukey) <- NULL
colnames(dfTukey) <- c("group", "stat", "pval", "conclusion", "bconclusion", "stars")
<- data.frame("field" = c("group", "stat", "pval", "conclusion",
descTukey "bconclusion", "stars"),
"label" = c("Groups", "Difference in means", "P-value",
"Conclusion", "Boolean conclusion", "Significance"),
"labelShort" = c("Groups", "Diff. in means", "P-value",
"Conclusion", "Boolean", "Signif."),
"type" = c("string", "numeric", "numeric", "string", "boolean", "numeric"))
<- paste("Tukey Honest Difference in Means (post-hoc ANOVA) tests of",
titleTukey 1], "between pairwise groups of", varnames[2])
varnames[<- list("type" = "PostHocTests",
Posthoc_ANOVA "title" = titleTukey,
"data" = dfTukey,
"fields" = descTukey)
Here is an example:
<- "patient.day_of_form_completion"
variabletest1 <- "patient.clinical_cqcf.consent_or_death_status"
variabletest2 <- r_bivariate("clinical", variabletest1, "clinical", variabletest2)
out_num_cat <- out_num_cat$Table$Posthoc_ANOVA$data
to_show $stars <- sapply(to_show$stars, function(x) paste(rep("*", x), collapse=""))
to_showcolnames(to_show) <- out_num_cat$Table$Posthoc_ANOVA$fields$labelShort
<- to_show[,-match("Boolean", colnames(to_show))]
to_show ::kable(to_show) knitr
Groups | Diff. in means | P-value | Conclusion | Signif. |
---|---|---|---|---|
deceased-consented | -8.654737 | 0.0000000 | Difference in means of patient.day_of_form_completion is significant between the two groups deceased-consented. | *** |
waiver-consented | 3.503158 | 0.9172299 | Difference in means of patient.day_of_form_completion is not significant between the two groups waiver-consented. | |
waiver-deceased | 12.157895 | 0.3640209 | Difference in means of patient.day_of_form_completion is not significant between the two groups waiver-deceased. |
and the corresponding output format:
::json_tree_view(out_num_cat$Table$Posthoc_ANOVA, scroll = TRUE) jsonview