Chapter 4 Dataset edition
The main function is r_edit_dataset
and it provides several ways to modify a dataset:
- transpose (
action
:"transpose"
) - change dataset nature (
action
:"set_dataset_nature"
+dataset_nature
) - subset rows (
action
:"subset"
+row_names
) - subset columns (
action
:"subset"
+column_names
) - change column types (
action
:"set_columns_type"
+columns_type
) - change rownames (
action
:"set_rownames"
+row_names
) - rename categories of a factor variable (
action
:"recode_categories"
+new_categories
) - reorder categories of a factor variable (
action
:"reorder_categories"
+new_categories
)
4.1 Parameters
The function has 8 arguments (the combination of parameters is detailed above).
param | short desc | class | required | default | description |
---|---|---|---|---|---|
datasetName | Dataset name | character | required | NULL | The name of a dataset to edit. |
action | action to perform | character | optional | NULL | An action to perform among “transpose”, “subset”, “set_dataset_nature”, “set_columns_type”, “set_rownames”, “recode_categories” and “reorder_categories” |
dataset_nature | new dataset nature in attributes | character | optional | NULL | A dataset nature (e.g. “generic”, “SNP data”), identical to the options offered when importing a new dataset. Only used if action is set to “set_dataset_nature”. |
columns_type | new columns class | character | optional | NULL | A data type among “numerical” and “character”. Only used if action is set to “set_columns_type”. |
new_categories | simple list of categories | list | optional | NULL | e.g. list(‘a’, ‘a’, ‘b’, NA) |
row_names | vector of row names | character | optional | NULL | Vector of row names of the dataset, used e.g. when action is set to “subset”. |
column_names | vector of column names | character | optional | NULL | Vector of column names of the dataset, used e.g. when action is set to “subset”, “set_columns_type” or t_rownames”. |
retrieve | name of an edited object to retrieve | character | optional | NULL | When the name of an edited dataset is given, the function outputs the corresponding objects and tables, in his current states. |
Other parameters are ignored. When retrieve is ‘current’, the objects and tables related to the current analysis on datasetName are output. This is useful on the interface when going from one dataset to the other.
The r_edit_dataset
function returns a list of objects containing the modified data and possibly an analysis,
as well as a list of tables containing a dataview, an edition history, and the object name of the modified data
# import protein dataset
<- "../forge/backend/R/data/protein.csv"
input r_wrapp("r_import", input = input, data.name = "proteins", header = TRUE,
sep = " ", quote = "\"", dec = ".")
## {"Messages":{"type":"notification","data":[{"type":"warning","text":"a 'row.names' column has been found, setting it to row labels."}]}}
4.2 Example: subset first 10 columns
<- r_edit_dataset(datasetName = "proteins",
out_edition action = "subset",
column_names = colnames(proteins)[1:5])
Two outputs are produced: Object
and Table
.
names(out_edition)
## [1] "Object" "Table"
The Object
component contains two objects:
names(out_edition$Object)
## [1] "editor" "edited"
- editor: the internal edition object, that contains the instructions and the history of the edition.
- edited: the new dataset.
The Table
component contains three objects:
names(out_edition$Table)
## [1] "Colnames" "Rownames" "ObjectName" "HistoryEdition"
## [5] "CatVarNames"
The DataView
and HistoryEdition
objects will be printed on the interface.
out_edition$Table$ObjectName
is the name of the dataset that has been created (as in the R environment, when the function is called with r_wrapp
).
4.3 Performs several actions
we use r_wrapp
to reproduce the interface behaviour:
<- r_wrapp(funcName = "r_edit_dataset",
out_edition datasetName = "proteins",
action = "subset",
column_names = colnames(proteins)[1:5])
Two new objects are created in the environment:
- the analysis (editor_1)
- the edited dataset (edited_1), hidden for now.
The editor analysis contains the list of successive actions (here only 1) performed on the dataset.
::json_tree_view(editor_1, scroll = TRUE) jsonview
The edited dataset contains the resulting dataset:
::kable(head(edited_1)) knitr
14.3.3_epsilon | 4E.BP1 | 4E.BP1_pS65 | 4E.BP1_pT37 | 4E.BP1_pT70 | |
---|---|---|---|---|---|
A0SH | -0.2146377 | -0.0348712 | -0.1492393 | -0.1141046 | 0.1883281 |
A0SJ | 0.1343556 | 0.2398837 | -0.1730644 | -0.8782864 | -0.1662046 |
A0SK | 0.2186520 | 2.3489380 | 0.4798254 | 1.2803679 | 0.8254676 |
A0SO | -0.1124943 | 0.2786066 | 0.1776554 | 0.6728072 | 0.4581671 |
A04N | 0.0021150 | 0.7086549 | -0.2063496 | 0.6498476 | 0.1838435 |
A04P | 0.0717803 | -0.1426144 | 0.2463977 | 0.6771736 | 0.1320518 |
Now we perform a second action on the same dataset:
<- r_wrapp(funcName = "r_edit_dataset",
out_edition datasetName = "proteins",
action = "subset",
row_names = rownames(proteins)[1:5])
::json_tree_view(editor_1, scroll = TRUE) jsonview
The editor analysis contains the list of successive actions (now 2) performed on the dataset.
The edited dataset has been replaced with the new created dataset.
::kable(edited_1) knitr
14.3.3_epsilon | 4E.BP1 | 4E.BP1_pS65 | 4E.BP1_pT37 | 4E.BP1_pT70 | |
---|---|---|---|---|---|
A0SH | -0.2146377 | -0.0348712 | -0.1492393 | -0.1141046 | 0.1883281 |
A0SJ | 0.1343556 | 0.2398837 | -0.1730644 | -0.8782864 | -0.1662046 |
A0SK | 0.2186520 | 2.3489380 | 0.4798254 | 1.2803679 | 0.8254676 |
A0SO | -0.1124943 | 0.2786066 | 0.1776554 | 0.6728072 | 0.4581671 |
A04N | 0.0021150 | 0.7086549 | -0.2063496 | 0.6498476 | 0.1838435 |
4.4 Extract the dataset
When the edition is done, the resulting dataset can be saved by calling the r_extract_dataset
function.
<- r_wrapp("r_extract_dataset", datasetName = "edited_1", userName = "test") res
4.5 Retrieve the current actions performed on a dataset
Selecting a dataset on the edition screen will call the action “retrieve” to get any analysis not “finished” (i.e. not extracted yet) if there is one, with all the already performed actions.
# Not run
<- r_edit_dataset(
out_retrieve datasetName = "proteins",
retrieve = "current")
It has the same structure as any r_edit_dataset
output:
a list of length 2 (Object and Table).
- The Object component is the edited dataset (when no edition yet, the current state of the dataset).
- The Table component contains a DataView, the ObjectName (of the name of the resulting dataset) and the history.
When an edition object is retrieved from the workspace screen (by clicking on “more”), the function will retrieve the current edition analysis by calling the name of the produced dataset:
# Not run
<- r_edit_dataset(
out_retrieve datasetName = "proteins",
retrieve = "edited_1")
4.6 Edit a retrieved dataset
It is possible to edit an already extracted dataset or, more precisely, to start a new edition based on an extracted dataset. Indeed, immediately after retrieval, any action creates a new editor
object, associated with a hidden edited
object. If there is a current edition on another - not extracted - dataset at the time of performing the action, then it is deleted. This ensures that there is always at most one edition going on (for one original dataset).
To achieve this, an attribute - called edit_head
- is added to the original dataset at the time of retrieval. It points to the dataset on which the following action should be performed. This attribute is updated at every call of r_edit_dataset
so that it points to an already extracted dataset only after retrieval. After any other action, it is set to NULL
- or removed, so that the regular workflow is preserved.