Generates a visual report for a dataset in an HTML bookdown document. The report provides figures and descriptive statistics for each variable to facilitate the assessment of input data. Statistics and figures are generated according to variable data type. The report can be used to help assess data structure, coherence across elements, and taxonomy or data dictionary formats. The summaries and figures provide additional information about variable distributions and descriptive statistics. The charts and tables are produced based on their data type. The variable can be grouped using group_by parameter, which is a (categorical) column in the dataset. The user may need to use as.factor() in this context. To fasten the process (and allow recycling object in a workflow) the user can feed the function with a .summary_var, which is the output of the function dataset_summarize() of the column(s) col and group_by. The summary must have the same parameters to operate.

harmonized_dossier_visualize(
  harmonized_dossier = NULL,
  to,
  taxonomy = NULL,
  valueType_guess = FALSE,
  pooled_harmonized_dataset = NULL,
  .summary_pool = NULL,
  .keep_files = TRUE
)

Arguments

harmonized_dossier

List of tibble(s), each of them being harmonized dataset.

to

A character string identifying the folder path where the bookdown report will be saved.

taxonomy

A tibble identifying the scheme used for variables classification.

valueType_guess

Whether the output should include a more accurate valueType that could be applied to the dataset. FALSE by default.

pooled_harmonized_dataset

A tibble, identifying the pooled harmonized dataset.

.summary_pool

A list which is the summary of the variables.

.keep_files

whether to keep the R-markdown files. TRUE by default. (used for internal processes and programming).

Value

A bookdown folder containing files in the specified output folder. To open the file in browser, open 'docs/index.html'. Or use open_visual_report()

Details

A harmonized dossier must be a named list containing at least one data frame or data frame extension (e.g. a tibble), each of them being harmonized dataset(s). It is generally the product of applying harmonization processing to a dossier object. The name of each tibble will be use as the reference name of the dataset. A harmonized dossier has four attributes : harmonizR::class which is ""harmonized_dossier"" ; harmonizR::Dataschema (provided by user) ; harmonizR::data processing elements ; harmonizR::harmonized_col_id (provided by user) which refers to the column in each dataset which identifies unique combination observation/dataset. This id column name is the same across the dataset(s), the DataSchema and the data processing elements (created by using 'id_creation') and is used to initiate the process of harmonization.

A taxonomy is classification scheme that can be defined for variable attributes. If defined, a taxonomy must be a data frame like object. It must be compatible with (and is generally extracted from) an Opal environment. To work with certain functions, a valid taxonomy must contain at least the columns 'taxonomy', 'vocabulary', and 'terms'. In addition, the taxonomy may follow Maelstrom research taxonomy, and its content can be evaluated accordingly, such as naming convention restriction, tagging elements, or scales, which are specific to Maelstrom Research. In this particular case, the tibble must also contain 'vocabulary_short', 'taxonomy_scale', 'vocabulary_scale' and 'term_scale' to work with some specific functions.

The valueType is a property of a variable and is required in certain functions to determine the handling of the variables. The valueType refers to the OBiBa-internal type of a variable. It is specified in a data dictionary in a column valueType and can be associated with variables as attributes. Acceptable valueTypes include 'text', 'integer', 'decimal', 'boolean', datetime', 'date'). The full list of OBiBa valueType possibilities and their correspondence with R data types are available using madshapR::valueType_list.

Examples

# \donttest{

pooled_harmonized_dataset <- DEMO_files_harmo$pooled_harmonized_dataset
summary_var_harmo <- DEMO_files_harmo$summary_var_harmo

to = tempdir()
harmonized_dossier_visualize(
  pooled_harmonized_dataset = pooled_harmonized_dataset,
  .summary_pool = summary_var_harmo,
  to = to)
#> Error in dataset_visualize(dataset = pooled_harmonized_dataset, group_by = group_by,     taxonomy = taxonomy, to = to, valueType_guess = valueType_guess,     .keep_files = .keep_files, .summary_var = .summary_pool): unused arguments (to = to, .keep_files = .keep_files)

# To open the file in browser, you can also open 'to/docs/index.html'.
open_visual_report(to)
#> Warning: The `to` argument of `open_visual_report()` is deprecated as of madshapR 1.0.2.
#>  Please use the `bookdown_path` argument of `bookdown_open()` instead.

# }