R/02-harmo_process_harmonization.R
pooled_harmonized_dataset_create.Rd
Generates the pooled harmonized dataset from harmonized datasets in a dossier. The pooled dataset has two extra columns which can be provided by the user (unique_col_dataset = harmonizR_dataset_name' and unique_col_id = 'harmonizR_unique_id' are the default). The first column refers to the name of each dataset which is the name of each tibble in the dossier. The second colunm refers to the column id in each harmonized dataset and which identifies unique combination (concatenated) observation/dataset. These two columns are added to ensure every information is safe during the process. The pooled_harmonized dataset comes with its data dictionary which is the harmonized dossier dataSchema, to which the two extra columns are added.
pooled_harmonized_dataset_create(
harmonized_dossier,
unique_col_dataset = "harmonizR_dataset_name",
unique_col_id = "harmonizR_unique_id"
)
List of tibble(s), each of them being harmonized dataset.
A character string identifying the name the column refering each dataset names.
A character string identifying the name of the column identifier of the dataset and will be the concatenation of id column value and dataset name.
A tibble, which is the pooled harmonized dataset from a harmonized dossier.
A harmonized dossier must be a named list containing at least one data frame
or data frame extension (e.g. a tibble), each of them being
harmonized dataset(s). It is generally the product of applying harmonization
processing to a dossier object. The name of each tibble will be use as the
reference name of the dataset. A harmonized dossier has four attributes :
harmonizR::class
which is ""harmonized_dossier"" ; harmonizR::Dataschema
(provided by user) ; harmonizR::data processing elements
;
harmonizR::harmonized_col_id
(provided by user) which refers to the column
in each dataset which identifies unique combination observation/dataset.
This id column name is the same across the dataset(s), the DataSchema and
the data processing elements (created by using 'id_creation') and is used to
initiate the process of harmonization.
{
harmonized_dossier <- DEMO_files_harmo$harmonized_dossier
pooled_harmonized_dataset_create(
harmonized_dossier,
unique_col_dataset = 'harmonizR_dataset_name',
unique_col_id = 'harmonizR_unique_id')
}
#> # A tibble: 19 × 3
#> harmonizR_dataset_name harmonizR_unique_id adm_unique_id
#> * <chr+lbl> <chr> <chr>
#> 1 dataset_MELBOURNE_1 [harmonized dataset da… dataset_MELBOURNE_… 377943
#> 2 dataset_MELBOURNE_1 [harmonized dataset da… dataset_MELBOURNE_… 497013
#> 3 dataset_MELBOURNE_1 [harmonized dataset da… dataset_MELBOURNE_… 927676
#> 4 dataset_MELBOURNE_1 [harmonized dataset da… dataset_MELBOURNE_… 995667
#> 5 dataset_MELBOURNE_1 [harmonized dataset da… dataset_MELBOURNE_… 21829
#> 6 dataset_MELBOURNE_1 [harmonized dataset da… dataset_MELBOURNE_… 209432
#> 7 dataset_MELBOURNE_1 [harmonized dataset da… dataset_MELBOURNE_… 272983
#> 8 dataset_MELBOURNE_1 [harmonized dataset da… dataset_MELBOURNE_… 580632
#> 9 dataset_MELBOURNE_1 [harmonized dataset da… dataset_MELBOURNE_… 304624
#> 10 dataset_MELBOURNE_1 [harmonized dataset da… dataset_MELBOURNE_… 637551
#> 11 dataset_MELBOURNE_1 [harmonized dataset da… dataset_MELBOURNE_… 279817
#> 12 dataset_MELBOURNE_1 [harmonized dataset da… dataset_MELBOURNE_… 235415
#> 13 dataset_MELBOURNE_1 [harmonized dataset da… dataset_MELBOURNE_… 373673
#> 14 dataset_MELBOURNE_1 [harmonized dataset da… dataset_MELBOURNE_… 485098
#> 15 dataset_MELBOURNE_1 [harmonized dataset da… dataset_MELBOURNE_… 299427
#> 16 dataset_MELBOURNE_1 [harmonized dataset da… dataset_MELBOURNE_… 854073
#> 17 dataset_MELBOURNE_1 [harmonized dataset da… dataset_MELBOURNE_… 197666
#> 18 dataset_MELBOURNE_1 [harmonized dataset da… dataset_MELBOURNE_… 130327
#> 19 dataset_MELBOURNE_1 [harmonized dataset da… dataset_MELBOURNE_… 220050