R/02-harmo_process_harmonization.R
as_harmonized_dossier.Rd
Validates the input object as a valid harmonized dossier and coerces it with the appropriate harmonizR::class attribute. This function mainly helps validate input within other functions of the package but could be used to check if an object is valid for use in a function.
as_harmonized_dossier(
object,
dataschema = NULL,
data_proc_elem = NULL,
harmonized_col_id = NULL
)
A potential harmonized dossier to be coerced.
A list of tibble(s) representing meta data of an associated harmonized dossier.
A tibble, identifying the input data processing elements.
A character string identifying the name of the column present in every dataset as identifier of the dataset.
A list of tibble(s), each of them identifying the harmonized dataset.
A harmonized dossier must be a named list containing at least one data frame
or data frame extension (e.g. a tibble), each of them being
harmonized dataset(s). It is generally the product of applying harmonization
processing to a dossier object. The name of each tibble will be use as the
reference name of the dataset. A harmonized dossier has four attributes :
harmonizR::class
which is ""harmonized_dossier"" ; harmonizR::Dataschema
(provided by user) ; harmonizR::data processing elements
;
harmonizR::harmonized_col_id
(provided by user) which refers to the column
in each dataset which identifies unique combination observation/dataset.
This id column name is the same across the dataset(s), the DataSchema and
the data processing elements (created by using 'id_creation') and is used to
initiate the process of harmonization.
A DataSchema defines the harmonized variables to be generated, representing meta data of an associated harmonized dossier. It must be a list of data frame like objects with elements named 'Variables' (required) and 'Categories' (if any). The 'Variables' element must contain at least the 'name' column, and the 'Categories' element must contain at least the 'variable' and 'name' columns to be usable in any function. To be considered as a minimum workable DataSchema, in 'Variables' the 'name' column must also have unique and non-null entries, and in 'Categories' the combination of 'variable' and 'name' columns must also be unique.
A data processing element contains the rules and metadata that will be used
to perform harmonization of input datasets in accordance with the DataSchema.
It must be a data-frame or data-frame extension (e.g. a tibble) and it must
contain certain columns which participate to the process, including the
dataschema_variable
, ss-table
,ss_variables
, Mlstr_harmo::rule_category
and
Mlstr_harmo::algorithm
. The mandatory first processing element must be
"id_creation" in Mlstr_harmo::rule_category
followed by the name of the column
taken as identifier of each dataset to initiate the process of harmonization.
{
as_harmonized_dossier(DEMO_files_harmo$harmonized_dossier)
}
#> $dataset_MELBOURNE_1
#> # A tibble: 19 × 1
#> adm_unique_id
#> <chr>
#> 1 377943
#> 2 497013
#> 3 927676
#> 4 995667
#> 5 21829
#> 6 209432
#> 7 272983
#> 8 580632
#> 9 304624
#> 10 637551
#> 11 279817
#> 12 235415
#> 13 373673
#> 14 485098
#> 15 299427
#> 16 854073
#> 17 197666
#> 18 130327
#> 19 220050
#>
#> attr(,"harmonizR::class")
#> [1] "harmonized_dossier"
#> attr(,"harmonizR::Dataschema")
#> attr(,"harmonizR::Dataschema")$Variables
#> # A tibble: 1 × 6
#> name `label:en` valueType index `Mlstr_area::1` `Mlstr_area::1.term`
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 adm_unique_id Unique ide… text 1 ADM Identifiers
#>
#> attr(,"harmonizR::Dataschema")attr(,"madshapR::class")
#> [1] "data_dict_mlstr"
#> attr(,"harmonizR::Dataschema")attr(,"harmonizR::class")
#> [1] "Dataschema_mlstr"
#> attr(,"harmonizR::data processing elements")
#> # A tibble: 1 × 11
#> index dataschema_variable valueType ss_table ss_variables
#> * <dbl> <chr> <chr> <chr> <chr>
#> 1 1 adm_unique_id text dataset_MELBOURNE_1 id
#> # ℹ 6 more variables: `Mlstr_harmo::rule_category` <chr>,
#> # `Mlstr_harmo::algorithm` <chr>, `Mlstr_harmo::comment` <chr>,
#> # `Mlstr_harmo::status` <chr>, `harmonizR::r_script` <chr>,
#> # `Mlstr_harmo::status_detail` <chr>
#> attr(,"harmonizR::harmonized_col_id")
#> [1] "adm_unique_id"