vignettes/d-Example-with-DEMO-files.Rmd
d-Example-with-DEMO-files.Rmd
# To install the R package:
# install.packages('harmonizR')
library(harmonizR)
#if you need help with the package, please use:
harmonizR_help()
madshapR_help()
Demo files are available in the built-in DEMO_files_harmo object. library. These provide illustrative examples of the main inputs and are used to run commands in this vignette.
names(DEMO_files_harmo)
# To see examples:
# View(DEMO_files_harmo$dd_TOKYO_format_maelstrom_tagged) # A data dictionaries
# View(DEMO_files_harmo$dataset_TOKYO) # Datasets
# View(DEMO_files_harmo$`data_processing_elements - final`)
# The data processing elements
# View(DEMO_files_harmo$`dataschema - final`) # The Dataschema
A general process with harmonizR involves the following steps:
The process is demonstrated below with all inputs in valid formats and no errors, i.e., assuming that the content and structure of all inputs are compatible with harmonizR. Error-checking and other manipulations of inputs will be covered in other vignettes.
If study-specific data dictionaries are available, they can be associated with their corresponding datasets using the function ‘data_dict_apply’. If not provided, minimal data dictionaries will automatically be created to meet technical requirements of the following functions. The study-specific datasets and associated data dictionaries are then grouped into a dossier.
# Associate metadata from data dictionary to the the data
dataset_MELBOURNE_1 <- data_dict_apply(
dataset = DEMO_files_harmo$dataset_MELBOURNE_1,
data_dict = DEMO_files_harmo$dd_MELBOURNE_1_format_maelstrom)
dataset_MELBOURNE_2 <- data_dict_apply(
dataset = DEMO_files_harmo$dataset_MELBOURNE_2,
data_dict = DEMO_files_harmo$dd_MELBOURNE_2_format_maelstrom)
dataset_PARIS <- data_dict_apply(
dataset = DEMO_files_harmo$dataset_PARIS,
data_dict = DEMO_files_harmo$dd_PARIS_format_maelstrom)
dataset_TOKYO <- data_dict_apply(
dataset = DEMO_files_harmo$dataset_TOKYO,
data_dict = DEMO_files_harmo$dd_TOKYO_format_maelstrom_tagged)
When the main inputs (dossier, DataSchema, and data processing element) are prepared,
# group the datasets in a dossier object
# NB: the name of the datasets in the study must match the column ss_table in
# the data processing element
dossier <- dossier_create(
dataset_list = list(
dataset_MELBOURNE_1,
dataset_MELBOURNE_2,
dataset_PARIS,
dataset_TOKYO))
dataschema <- DEMO_files_harmo$`dataschema - final`
data_proc_elem <- DEMO_files_harmo$`data_processing_elements - final`
You can proceed with harmonization using ‘harmo_process’.
# Process harmonization
harmonized_dossier <- harmo_process(dossier, dataschema, data_proc_elem)
show_harmo_error(harmonized_dossier)
You can then assess and summarize the harmonized data to identify potential issues, produce summary statistics, and generate visual reports.
Warning ⚠ This tutorial creates for you a folder ‘tmp’ where the visual report is generated.
# Assess the harmonization (this report can be downloaded as an excel file)
harmonized_dossier_evaluate <- harmonized_dossier_evaluate(harmonized_dossier)
# View(harmonized_dossier_evaluate)
# Produce summary statistics (this summary can be downloaded as an excel file)
harmonized_dossier_summary <- harmonized_dossier_summarise(harmonized_dossier)
# View(harmonized_dossier_summary)
# Generate visual report (this report is an web-base interactive application)
bookdown_path <- paste0('tmp/',basename(tempdir()))
# print(bookdown_path)
harmonized_dossier_visualize(
harmonized_dossier,
to = bookdown_path)
open_visual_report(bookdown_path) # open the report in browser
# file.remove(bookdown_path)