R/02-harmo_process_harmonization.R
dataschema_extract.Rd
Creates the DataSchema in the Maelstrom Research formats (with 'Variables' and 'Categories' in separate tibbles and standard columns in each) from any data processing elements.
dataschema_extract(data_proc_elem)
A tibble, identifying the input data processing elements.
A list of tibble(s), 'Variables' and 'Categories' (if any), each of them being the two elements of the DataSchema.
A data processing element contains the rules and metadata that will be used
to perform harmonization of input datasets in accordance with the DataSchema.
It must be a data-frame or data-frame extension (e.g. a tibble) and it must
contain certain columns which participate to the process, including the
dataschema_variable
, ss-table
,ss_variables
, Mlstr_harmo::rule_category
and
Mlstr_harmo::algorithm
. The mandatory first processing element must be
""id_creation"" in Mlstr_harmo::rule_category
followed by the name of the column
taken as identifier of each dataset to initiate the process of harmonization.
{
# You can use our demonstration files to run examples
dataschema_extract(
data_proc_elem = DEMO_files_harmo$`data_processing_elements - final`)
}
#> $Variables
#> # A tibble: 13 × 3
#> name label valueType
#> <chr> <chr> <chr>
#> 1 adm_unique_id adm_unique_id text
#> 2 adm_study adm_study text
#> 3 adm_year_dce adm_year_dce text
#> 4 sdc_age sdc_age integer
#> 5 sdc_gender sdc_gender integer
#> 6 phy_height phy_height decimal
#> 7 phy_weight phy_weight decimal
#> 8 phy_bmi phy_bmi decimal
#> 9 rep_preg_ever rep_preg_ever integer
#> 10 rep_preg_curr rep_preg_curr integer
#> 11 lsb_smo_ever lsb_smo_ever integer
#> 12 lsb_smo_curr lsb_smo_curr integer
#> 13 lsb_smo_status lsb_smo_status integer
#>
#> attr(,"madshapR::class")
#> [1] "data_dict_mlstr"
#> attr(,"harmonizR::class")
#> [1] "Dataschema_mlstr"