Get started

Install the package

# To install the R package:
# install.packages('harmonizR')

library(harmonizR)

#if you need help with the package, please use:
harmonizR_help()
madshapR_help()

Demo files

Demo files are available in the built-in DEMO_files_harmo object. library. These provide illustrative examples of the main inputs and are used to run commands in this vignette.


names(DEMO_files_harmo)

# To see examples:
# View(DEMO_files_harmo$dd_TOKYO_format_maelstrom_tagged) # A data dictionaries
# View(DEMO_files_harmo$dataset_TOKYO)                    # Datasets

# View(DEMO_files_harmo$`data_processing_elements - final`) 
                                                  # The data processing elements
# View(DEMO_files_harmo$`dataschema - final`)       # The Dataschema 
                  

General objective of this demonstration

A general process with harmonizR involves the following steps:

  • Associate study-specific datasets with their data dictionaries.
  • Prepare a “dossier” of study-specific datasets and data dictionaries.
  • Harmonize a dossier of datasets using the DataSchema and data processing elements.
  • Review the dossier of harmonized datasets.

The process is demonstrated below with all inputs in valid formats and no errors, i.e., assuming that the content and structure of all inputs are compatible with harmonizR. Error-checking and other manipulations of inputs will be covered in other vignettes.

Prepare the study-specific inputs

If study-specific data dictionaries are available, they can be associated with their corresponding datasets using the function ‘data_dict_apply’. If not provided, minimal data dictionaries will automatically be created to meet technical requirements of the following functions. The study-specific datasets and associated data dictionaries are then grouped into a dossier.

# Associate metadata from data dictionary to the the data
dataset_MELBOURNE_1 <- data_dict_apply(
  dataset = DEMO_files_harmo$dataset_MELBOURNE_1,
  data_dict = DEMO_files_harmo$dd_MELBOURNE_1_format_maelstrom)

dataset_MELBOURNE_2 <- data_dict_apply(
  dataset = DEMO_files_harmo$dataset_MELBOURNE_2,
  data_dict = DEMO_files_harmo$dd_MELBOURNE_2_format_maelstrom)

dataset_PARIS <- data_dict_apply(
  dataset = DEMO_files_harmo$dataset_PARIS,
  data_dict = DEMO_files_harmo$dd_PARIS_format_maelstrom)

dataset_TOKYO <- data_dict_apply(
  dataset = DEMO_files_harmo$dataset_TOKYO,
  data_dict = DEMO_files_harmo$dd_TOKYO_format_maelstrom_tagged)

Generate harmonized data

When the main inputs (dossier, DataSchema, and data processing element) are prepared,


# group the datasets in a dossier object
# NB: the name of the datasets in the study must match the column ss_table in 
# the data processing element
dossier <- dossier_create(
  dataset_list = list(
    dataset_MELBOURNE_1, 
    dataset_MELBOURNE_2, 
    dataset_PARIS, 
    dataset_TOKYO))

dataschema <- DEMO_files_harmo$`dataschema - final`
data_proc_elem <- DEMO_files_harmo$`data_processing_elements - final`

You can proceed with harmonization using ‘harmo_process’.


# Process harmonization
harmonized_dossier <- harmo_process(dossier, dataschema, data_proc_elem)
show_harmo_error(harmonized_dossier)

Assess harmonized data

You can then assess and summarize the harmonized data to identify potential issues, produce summary statistics, and generate visual reports.

Warning ⚠ This tutorial creates for you a folder ‘tmp’ where the visual report is generated.

# Assess the harmonization (this report can be downloaded as an excel file)
harmonized_dossier_evaluate <- harmonized_dossier_evaluate(harmonized_dossier)
# View(harmonized_dossier_evaluate)

# Produce summary statistics (this summary can be downloaded as an excel file)
harmonized_dossier_summary <- harmonized_dossier_summarise(harmonized_dossier)
# View(harmonized_dossier_summary)

# Generate visual report (this report is an web-base interactive application)
bookdown_path <- paste0('tmp/',basename(tempdir()))
# print(bookdown_path)
harmonized_dossier_visualize(
  harmonized_dossier, 
  to = bookdown_path)

open_visual_report(bookdown_path) # open the report in browser
# file.remove(bookdown_path)