Generate the pooled dataset from harmonized datasets in a dossier — pooled_harmonized_dataset

Generates the pooled harmonized dataset from harmonized datasets in a dossier. The pooled dataset has two extra columns which can be provided by the user (unique_col_dataset = harmonizR_dataset_name' and unique_col_id = 'harmonizR_unique_id' are the default). The first column refers to the name of each dataset which is the name of each tibble in the dossier. The second colunm refers to the column id in each harmonized dataset and which identifies unique combination (concatenated) observation/dataset. These two columns are added to ensure every information is safe during the process. The pooled_harmonized dataset comes with its data dictionary which is the harmonized dossier dataSchema, to which the two extra columns are added.

pooled_harmonized_dataset_create(
  harmonized_dossier,
  unique_col_dataset = "harmonizR_dataset_name",
  unique_col_id = "harmonizR_unique_id"
)

Arguments

harmonized_dossier: List of tibble(s), each of them being harmonized dataset.
unique_col_dataset: A character string identifying the name the column refering each dataset names.
unique_col_id: A character string identifying the name of the column identifier of the dataset and will be the concatenation of id column value and dataset name.

Value

A tibble, which is the pooled harmonized dataset from a harmonized dossier.

Details

A harmonized dossier must be a named list containing at least one data frame or data frame extension (e.g. a tibble), each of them being harmonized dataset(s). It is generally the product of applying harmonization processing to a dossier object. The name of each tibble will be use as the reference name of the dataset. A harmonized dossier has four attributes : harmonizR::class which is ""harmonized_dossier"" ; harmonizR::Dataschema (provided by user) ; harmonizR::data processing elements ; harmonizR::harmonized_col_id (provided by user) which refers to the column in each dataset which identifies unique combination observation/dataset. This id column name is the same across the dataset(s), the DataSchema and the data processing elements (created by using 'id_creation') and is used to initiate the process of harmonization.

Examples

{

harmonized_dossier <- DEMO_files_harmo$harmonized_dossier

pooled_harmonized_dataset_create(
  harmonized_dossier,
  unique_col_dataset = 'harmonizR_dataset_name',
  unique_col_id = 'harmonizR_unique_id')
  
}
#> # A tibble: 19 × 3
#>    harmonizR_dataset_name                      harmonizR_unique_id adm_unique_id
#>  * <chr+lbl>                                   <chr>               <chr>        
#>  1 dataset_MELBOURNE_1 [harmonized dataset da… dataset_MELBOURNE_… 377943       
#>  2 dataset_MELBOURNE_1 [harmonized dataset da… dataset_MELBOURNE_… 497013       
#>  3 dataset_MELBOURNE_1 [harmonized dataset da… dataset_MELBOURNE_… 927676       
#>  4 dataset_MELBOURNE_1 [harmonized dataset da… dataset_MELBOURNE_… 995667       
#>  5 dataset_MELBOURNE_1 [harmonized dataset da… dataset_MELBOURNE_… 21829        
#>  6 dataset_MELBOURNE_1 [harmonized dataset da… dataset_MELBOURNE_… 209432       
#>  7 dataset_MELBOURNE_1 [harmonized dataset da… dataset_MELBOURNE_… 272983       
#>  8 dataset_MELBOURNE_1 [harmonized dataset da… dataset_MELBOURNE_… 580632       
#>  9 dataset_MELBOURNE_1 [harmonized dataset da… dataset_MELBOURNE_… 304624       
#> 10 dataset_MELBOURNE_1 [harmonized dataset da… dataset_MELBOURNE_… 637551       
#> 11 dataset_MELBOURNE_1 [harmonized dataset da… dataset_MELBOURNE_… 279817       
#> 12 dataset_MELBOURNE_1 [harmonized dataset da… dataset_MELBOURNE_… 235415       
#> 13 dataset_MELBOURNE_1 [harmonized dataset da… dataset_MELBOURNE_… 373673       
#> 14 dataset_MELBOURNE_1 [harmonized dataset da… dataset_MELBOURNE_… 485098       
#> 15 dataset_MELBOURNE_1 [harmonized dataset da… dataset_MELBOURNE_… 299427       
#> 16 dataset_MELBOURNE_1 [harmonized dataset da… dataset_MELBOURNE_… 854073       
#> 17 dataset_MELBOURNE_1 [harmonized dataset da… dataset_MELBOURNE_… 197666       
#> 18 dataset_MELBOURNE_1 [harmonized dataset da… dataset_MELBOURNE_… 130327       
#> 19 dataset_MELBOURNE_1 [harmonized dataset da… dataset_MELBOURNE_… 220050