R/02-harmo_process_harmonization.R
as_dataschema.Rd
Validates the input object as a valid DataSchema and coerces it with the appropriate harmonizR::class attribute. This function mainly helps validate input within other functions of the package but could be used to check if an object is valid for use in a function.
as_dataschema(object, as_dataschema_mlstr = FALSE)
A potential Dataschema (list of tibble) to be coerced.
Whether the output DataSchema should have a minimal DataSchema structure or additional attributes associated with additional capabilities for Maelstrom and integrated workflows, such as Opal environments. FALSE by default.
A list of tibble(s), 'Variables' and 'Categories' (if any), each of them being the two elements of the DataSchema.
A DataSchema defines the harmonized variables to be generated, representing meta data of an associated harmonized dossier. It must be a list of data frame like objects with elements named 'Variables' (required) and 'Categories' (if any). The 'Variables' element must contain at least the 'name' column, and the 'Categories' element must contain at least the 'variable' and 'name' columns to be usable in any function. To be considered as a minimum workable DataSchema, in 'Variables' the 'name' column must also have unique and non-null entries, and in 'Categories' the combination of 'variable' and 'name' columns must also be unique.
{
# You can use our demonstration files to run examples
as_dataschema(DEMO_files_harmo$`dataschema - final`)
}
#> $Variables
#> # A tibble: 13 × 9
#> name typeof index `label:en` valueType unit `Mlstr_area::1`
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 adm_unique_id character 1 Unique identi… text NA ADM
#> 2 adm_study character 2 Indicator of … text NA ADM
#> 3 adm_year_dce character 3 Indicator of … text NA ADM
#> 4 sdc_age integer 4 Participant's… integer years SDC
#> 5 sdc_gender integer 5 Gender of the… integer NA SDC
#> 6 phy_height double 6 participant's… decimal cm PME
#> 7 phy_weight double 7 participant's… decimal kg PME
#> 8 phy_bmi double 8 participant's… decimal kg/m PME
#> 9 rep_preg_ever integer 9 whether the p… integer NA REP
#> 10 rep_preg_curr integer 10 whether the p… integer NA REP
#> 11 lsb_smo_ever integer 11 whether the p… integer NA LSB
#> 12 lsb_smo_curr integer 12 whether the p… integer NA LSB
#> 13 lsb_smo_status integer 13 participant s… integer NA LSB
#> # ℹ 2 more variables: `Mlstr_area::1.term` <chr>, `Mlstr_area::1.scale` <chr>
#>
#> $Categories
#> # A tibble: 13 × 5
#> variable name labels `label:en` missing
#> <chr> <chr> <chr> <chr> <chr>
#> 1 sdc_gender 1 1 Male 0
#> 2 sdc_gender 2 2 Female 0
#> 3 rep_preg_ever 0 0 never pregnant 0
#> 4 rep_preg_ever 1 1 pregnant once or more 0
#> 5 rep_preg_curr 0 0 currently pregnant 0
#> 6 rep_preg_curr 1 1 not currently pregnant 0
#> 7 lsb_smo_ever 0 0 never smoked 0
#> 8 lsb_smo_ever 1 1 smoked one pack of cigarette or more 0
#> 9 lsb_smo_curr 0 0 currently smoker 0
#> 10 lsb_smo_curr 1 1 not currently smoker 0
#> 11 lsb_smo_status 0 0 never smoker 0
#> 12 lsb_smo_status 1 1 former smoker 0
#> 13 lsb_smo_status 2 2 current smoker 0
#>
#> attr(,"madshapR::class")
#> [1] "data_dict"
#> attr(,"harmonizR::class")
#> [1] "Dataschema"