Validate and coerce any object as an harmonized dossier — as_harmonized

Validates the input object as a valid harmonized dossier and coerces it with the appropriate harmonizR::class attribute. This function mainly helps validate input within other functions of the package but could be used to check if an object is valid for use in a function.

as_harmonized_dossier(
  object,
  dataschema = NULL,
  data_proc_elem = NULL,
  harmonized_col_id = NULL
)

Arguments

object: A potential harmonized dossier to be coerced.
dataschema: A list of tibble(s) representing meta data of an associated harmonized dossier.
data_proc_elem: A tibble, identifying the input data processing elements.
harmonized_col_id: A character string identifying the name of the column present in every dataset as identifier of the dataset.

Value

A list of tibble(s), each of them identifying the harmonized dataset.

Details

A harmonized dossier must be a named list containing at least one data frame or data frame extension (e.g. a tibble), each of them being harmonized dataset(s). It is generally the product of applying harmonization processing to a dossier object. The name of each tibble will be use as the reference name of the dataset. A harmonized dossier has four attributes : harmonizR::class which is ""harmonized_dossier"" ; harmonizR::Dataschema (provided by user) ; harmonizR::data processing elements ; harmonizR::harmonized_col_id (provided by user) which refers to the column in each dataset which identifies unique combination observation/dataset. This id column name is the same across the dataset(s), the DataSchema and the data processing elements (created by using 'id_creation') and is used to initiate the process of harmonization.

A DataSchema defines the harmonized variables to be generated, representing meta data of an associated harmonized dossier. It must be a list of data frame like objects with elements named 'Variables' (required) and 'Categories' (if any). The 'Variables' element must contain at least the 'name' column, and the 'Categories' element must contain at least the 'variable' and 'name' columns to be usable in any function. To be considered as a minimum workable DataSchema, in 'Variables' the 'name' column must also have unique and non-null entries, and in 'Categories' the combination of 'variable' and 'name' columns must also be unique.

A data processing element contains the rules and metadata that will be used to perform harmonization of input datasets in accordance with the DataSchema. It must be a data-frame or data-frame extension (e.g. a tibble) and it must contain certain columns which participate to the process, including the dataschema_variable, ss-table,ss_variables, Mlstr_harmo::rule_category and Mlstr_harmo::algorithm. The mandatory first processing element must be "id_creation" in Mlstr_harmo::rule_category followed by the name of the column taken as identifier of each dataset to initiate the process of harmonization.

Examples

{

as_harmonized_dossier(DEMO_files_harmo$harmonized_dossier)
  
}
#> $dataset_MELBOURNE_1
#> # A tibble: 19 × 1
#>    adm_unique_id
#>    <chr>        
#>  1 377943       
#>  2 497013       
#>  3 927676       
#>  4 995667       
#>  5 21829        
#>  6 209432       
#>  7 272983       
#>  8 580632       
#>  9 304624       
#> 10 637551       
#> 11 279817       
#> 12 235415       
#> 13 373673       
#> 14 485098       
#> 15 299427       
#> 16 854073       
#> 17 197666       
#> 18 130327       
#> 19 220050       
#> 
#> attr(,"harmonizR::class")
#> [1] "harmonized_dossier"
#> attr(,"harmonizR::Dataschema")
#> attr(,"harmonizR::Dataschema")$Variables
#> # A tibble: 1 × 6
#>   name          `label:en`  valueType index `Mlstr_area::1` `Mlstr_area::1.term`
#>   <chr>         <chr>       <chr>     <chr> <chr>           <chr>               
#> 1 adm_unique_id Unique ide… text      1     ADM             Identifiers         
#> 
#> attr(,"harmonizR::Dataschema")attr(,"madshapR::class")
#> [1] "data_dict_mlstr"
#> attr(,"harmonizR::Dataschema")attr(,"harmonizR::class")
#> [1] "Dataschema_mlstr"
#> attr(,"harmonizR::data processing elements")
#> # A tibble: 1 × 11
#>   index dataschema_variable valueType ss_table            ss_variables
#> * <dbl> <chr>               <chr>     <chr>               <chr>       
#> 1     1 adm_unique_id       text      dataset_MELBOURNE_1 id          
#> # ℹ 6 more variables: `Mlstr_harmo::rule_category` <chr>,
#> #   `Mlstr_harmo::algorithm` <chr>, `Mlstr_harmo::comment` <chr>,
#> #   `Mlstr_harmo::status` <chr>, `harmonizR::r_script` <chr>,
#> #   `Mlstr_harmo::status_detail` <chr>
#> attr(,"harmonizR::harmonized_col_id")
#> [1] "adm_unique_id"