R/02-harmo_process_harmonization.R
as_dataschema.Rd
Checks if an object is a valid DataSchema and returns it with the appropriate
Rmonize::class
attribute. This function mainly helps validate inputs within
other functions of the package but could be used separately to ensure that an
object has an appropriate structure.
as_dataschema(object, as_dataschema_mlstr = FALSE)
A potential DataSchema object to be coerced.
Whether the output DataSchema should be coerced with specific format restrictions for compatibility with other Maelstrom Research software. FALSE by default.
A list of data frame(s) named 'Variables' and (if any) 'Categories',
with Rmonize::class
'dataschema'.
A DataSchema is the list of core variables to generate across datasets and
related metadata. A DataSchema object is a list of data frames with elements
named 'Variables' (required) and 'Categories' (if any). The 'Variables'
element must contain at least the name
column, and the 'Categories'
element must contain at least the variable
and name
columns to be usable
in any function. In 'Variables' the name
column must also have unique
entries, and in 'Categories' the combination of variable
and name
columns
must also be unique.
The object may be specifically formatted to be compatible with additional Maelstrom Research software, in particular Opal environments.
{
# Use Rmonize_DEMO to run examples.
library(dplyr)
glimpse(as_dataschema(Rmonize_DEMO$`dataschema - final`))
}
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
#> List of 2
#> $ Variables : tibble [13 × 5] (S3: tbl_df/tbl/data.frame)
#> ..$ name : chr [1:13] "adm_unique_id" "adm_study" "adm_year_dce" "sdc_age" ...
#> ..$ typeof : chr [1:13] "character" "character" "character" "integer" ...
#> ..$ index : chr [1:13] "1" "2" "3" "4" ...
#> ..$ label:en : chr [1:13] "Unique identification code of the participant." "Indicator of the survey study." "Indicator of the survey data collection event." "Participant's age at time of data collection event." ...
#> ..$ valueType: chr [1:13] "text" "text" "text" "integer" ...
#> $ Categories: tibble [16 × 5] (S3: tbl_df/tbl/data.frame)
#> ..$ variable: chr [1:16] "adm_study" "adm_study" "adm_study" "sdc_sex" ...
#> ..$ name : chr [1:16] "MELBOURNE" "PARIS" "TOKYO" "1" ...
#> ..$ labels : chr [1:16] "MELBOURNE" "PARIS" "TOKYO" "1" ...
#> ..$ label:en: chr [1:16] "MELBOURNE" "PARIS" "TOKYO" "Male" ...
#> ..$ missing : chr [1:16] "0" "0" "0" "0" ...
#> - attr(*, "madshapR::class")= chr "data_dict"
#> - attr(*, "Rmonize::class")= chr "dataschema"