Objective of the vignette

This vignette provides examples of applying Rmonize functions to prepare inputs, process data, and validate and consolidate harmonized data, using illustrative demo objects included with the package. The vignette focuses on demonstrating the usage of functions on inputs that already have valid structure and content. See the Glossary and Reference pages for more details about the terms used in this document.

Get started

Install the package

# To install Rmonize:
install.packages('Rmonize')

library(Rmonize)
# If you need help with the package, please use:
Rmonize_help()

# Downloadable templates are available here
Rmonize_templates()

# Demo files are available here, along with an online demonstration process 
Rmonize_DEMO

Demo objects

Demo objects are available through the built-in Rmonize_DEMO object. They include example input datasets, input data dictionaries, DataSchema, Data Processing Elements (DPE), and harmonized datasets that provide illustrative examples of the structure and content of the main objects used by Rmonize functions.

# To see contents
names(Rmonize_DEMO)
print(Rmonize_DEMO$dataset_TOKYO)                        # An input dataset
print(Rmonize_DEMO$data_dict_TOKYO)                      # An input data dictionary
print(Rmonize_DEMO$`data_processing_elements - final`) # A Data Processing Elements 
print(Rmonize_DEMO$`dataschema - final`)                 # A DataSchema

Pipeline

Prepare inputs

DataSchema and Data Processing Elements (DPE)

The DataSchema and DPE are generally prepared from Excel templates and imported into R. Separate documentation is provided for preparing Data Processing Elements. The DataSchema will be an R list with named elements Variables and Categories. The DPE will be a data frame. You can check the structure of each object and assign the correct attributes explicitly to ensure compatibility with Rmonize functions.

# as_dataschema and as_data_proc_elem will check the structure of object and 
# assign attributes to them.

dataschema <- as_dataschema(Rmonize_DEMO$`dataschema - final`)
data_proc_elem <- as_data_proc_elem(Rmonize_DEMO$`data_processing_elements - final`)

Note: In the DEMO DPEs, all elements for three different input datasets are combined in one file. A DPE can also be prepared for each input dataset separately, and the individual DPEs combined as needed for processing. The DataSchema and DPE objects can be assigned explicitly

Combine input datasets and data dictionaries in a dossier

If input data dictionaries (with metadata about variables and categories in input datasets) are available, they can be associated with their corresponding datasets using the function data_dict_apply(). If not provided, minimal data dictionaries will automatically be created to meet technical requirements of Rmonize functions as needed.

# Associate metadata from input data dictionaries to the input datasets.

dataset_MELBOURNE <- data_dict_apply(
  dataset = Rmonize_DEMO$dataset_MELBOURNE,
  data_dict = Rmonize_DEMO$data_dict_MELBOURNE)

dataset_PARIS <- data_dict_apply(
  dataset = Rmonize_DEMO$dataset_PARIS,
  data_dict = Rmonize_DEMO$data_dict_PARIS)

dataset_TOKYO <- data_dict_apply(
  dataset = Rmonize_DEMO$dataset_TOKYO,
  data_dict = Rmonize_DEMO$data_dict_TOKYO)

For use in harmo_process(), one or more input datasets and any associated data dictionaries must be grouped into a named list (referred to as a “dossier” in Rmonize). This can be done explicitly with dossier_create().


# Group the datasets into a dossier object.
# NB: The names of the datasets in the dossier must match the column 
# input_dataset in the Data Processing Elements

dossier <- dossier_create( dataset_list = list(
  dataset_MELBOURNE, 
  dataset_PARIS, 
  dataset_TOKYO))

Process data

When the input dossier, DataSchema, and DPE are prepared, you can initiate processing using harmo_process(), which uses information from the inputs jointly to generate harmonized datasets (one for each input dataset).


harmonized_dossier <- harmo_process(
    dossier, 
    dataschema, 
    data_proc_elem)

This produces a harmonized dossier (a list of harmonized datasets and associated metadata, with associated information from the DataSchema and DPE). If there were any errors or warnings during the process, these will be printed in the console. You can print a summary of any errors and warnings associated with individual algorithms in the console with show_harmo_error().


show_harmo_error(harmonized_dossier)

Note: If there is a processing error, a harmonized dossier will be created, but the affected harmonized dataset(s) will be empty.

Validate and consolidate harmonized data

You can assess and summarize harmonized datasets to identify potential issues, produce summary statistics, and generate visual reports. To perform evaluations and summaries of the entire harmonized dossier, you can use harmonized_dossier_evaluate() and harmonized_dossier_summarize(), which produce summary tables (that can be exported to Excel).

# Evaluate and summarize a harmonized dossier containing multiple harmonized datasets.

harmonized_dossier_evaluation <- harmonized_dossier_evaluate(harmonized_dossier)
harmonized_dossier_summary <- harmonized_dossier_summarize(harmonized_dossier)

A visual report with summary figures for each variable can also be produced using harmonized_dossier_visualize().

Warning ⚠ This tutorial creates a folder ‘tmp’ where the visual report is generated.

# place your harmonized dossier in a folder. This folder name is mandatory, and 
# must not previously exist.

bookdown_path <- paste0('tmp/',basename(tempdir()))

harmonized_dossier_visualize(harmonized_dossier, bookdown_path)

# Open the visual report in a browser.
bookdown_open(bookdown_path)

To combine the individual harmonized datasets in a harmonized dossier into one pooled harmonized dataset, use pooled_harmonized_dataset_create().


# Generate one pooled harmonized dataset from a harmonized dossier
pooled_harmonized_dataset <- 
  pooled_harmonized_dataset_create(harmonized_dossier)

To get the harmonized data dictionary for an individual harmonized dataset (which contains more details about algorithms and R scripts used in processing for that dataset), use data_dict_extract().


# Extract the harmonized data dictionary for one harmonized dataset.

harmonized_TOKYO_dd <- data_dict_extract(harmonized_dossier$dataset_TOKYO)

Use harmonized data

Once you are satisfied with the outputs, they can be exported in any R compatible format. For example, harmonized datasets can be exported as labelled datasets that keep variable attributes as metadata (e.g. SAS files), and tabular reports can be exported as Excel files.


library(fabR)
## Examples of exporting objects as Excel files.

# write_excel_allsheets(harmonized_dossier, "myfile.xlsx")
# write_excel_allsheets(harmonized_dossier_summary, "myfile.xlsx")
# write_excel_allsheets(harmonized_TOKYO_dd, "myfile.xlsx")