Chapter 15 Combining datasets

15.1 Introduction

This page describes how the function r_combine_datasets can be used to obtain a new object that is the first main step for any combination analysis. In particular, this function is the first step of the MFA and the PLS analyses.

15.1.1 Used datasets

The workflow will be illustrated on the protein dataset, the clinical dataset and the mRNA dataset. These datasets do not have exactly the same rows.

input <- "../forge/backend/R/data/protein.csv"
r_wrapp("r_import", input, data.name = "proteins", sep = " ", row.names = 1)

input <- "../forge/backend/R/data/clinical.csv"
r_wrapp("r_import", input, data.name = "clinical", row.names = 1)

input <- "../forge/backend/R/data/mrna.csv"
r_wrapp("r_import", input, data.name = "mrna", row.names = 1)

15.2 Function call and options

The function r_combine_datasets has the following options:

  • datasetNames: list (no default and required) with the names of the datasets (character) used as inputs.

  • userName: character (by default will take the default name of the output object) that specifies the name of the complex object returned by the function, as given by the user.

out_combine <- r_wrapp("r_combine_datasets", 
                       list("proteins", "mrna", "clinical"))

15.2.1 State of the workspace after the function call

After the function call, the R workspace contains the following objects, where the combined analysis contains information on its parent datasets, ordered alphabetically:

print(names(object_db))
## [1] "proteins"     "clinical"     "mrna"         "combinedDF_1"
jsonview::json_tree_view(
    jsonlite::toJSON(graph_db, pretty = TRUE, auto_unbox = TRUE), 
    scroll = T
)

15.3 Output of the function

In addition to the created object (which names is also returned in the entry ObjectName in the output of the r_wrapp call), the function also returns some descriptive statistics and plots that are to be displayed to the user.

15.3.1 Returned tables

Two tables are returned that provide descriptive statistics before and after (if performed) the filtering step that consists in filtering out all rows that are not common to all datasets. This information is provided in entries dataInfoBefore and dataInfoAfter (this second one is not always provided) in the output of the r_wrapp call.

jsonview::json_tree_view(
    jsonlite::toJSON(combinedDF_1$Table$dataInfoBefore,
                     pretty = TRUE, auto_unbox = TRUE), 
    scroll = T
)
jsonview::json_tree_view(
    jsonlite::toJSON(combinedDF_1$Table$dataInfoAfter,
                     pretty = TRUE, auto_unbox = TRUE), 
    scroll = T
)

15.3.2 Returned plots

Two plots are returned that provide information on common individuals in all datasets. The upset plot is provided in the entry UpsetPlot in the output of the r_wrapp call. It is a list meant for json conversion. Here is a truncated version:

jsonview::json_tree_view(
    jsonlite::toJSON(list(type = combinedDF_1$Graphical$UpsetPlot$type,
                          data = combinedDF_1$Graphical$UpsetPlot$data[1:10]),
                     pretty = TRUE, auto_unbox = TRUE), 
    scroll = T
)

The Venn diagramm is given in the entry VennPlot in the output of the r_wrapp call. It is a list meant for json conversion:

jsonview::json_tree_view(
    jsonlite::toJSON(combinedDF_1$Graphical$VennPlot,
                     pretty = TRUE, auto_unbox = TRUE), 
    scroll = T
)