R/03-harmonized_data_evaluate.R
harmonized_dossier_evaluate.Rd
Assesses the content and structure of a harmonized dossier object (list of harmonized datasets) and reports possible issues in the datasets and data dictionaries to facilitate assessment of input data. The report can be used to help assess data structure, presence of fields, coherence across elements, and taxonomy or data dictionary formats. This report is compatible with Excel and can be exported as an Excel spreadsheet.
harmonized_dossier_evaluate(harmonized_dossier, taxonomy = NULL)
List of tibble(s), each of them being harmonized dataset.
A tibble identifying the scheme used for variables classification.
A list of report(s), each of them being tibble(s) ('Overview and summary) for each harmonized dataset.
A harmonized dossier must be a named list containing at least one data frame
or data frame extension (e.g. a tibble), each of them being
harmonized dataset(s). It is generally the product of applying harmonization
processing to a dossier object. The name of each tibble will be use as the
reference name of the dataset. A harmonized dossier has four attributes :
harmonizR::class
which is ""harmonized_dossier"" ; harmonizR::Dataschema
(provided by user) ; harmonizR::data processing elements
;
harmonizR::harmonized_col_id
(provided by user) which refers to the column
in each dataset which identifies unique combination observation/dataset.
This id column name is the same across the dataset(s), the DataSchema and
the data processing elements (created by using 'id_creation') and is used to
initiate the process of harmonization.
A taxonomy is classification scheme that can be defined for variable attributes. If defined, a taxonomy must be a data frame like object. It must be compatible with (and is generally extracted from) an Opal environment. To work with certain functions, a valid taxonomy must contain at least the columns 'taxonomy', 'vocabulary', and 'terms'. In addition, the taxonomy may follow Maelstrom research taxonomy, and its content can be evaluated accordingly, such as naming convention restriction, tagging elements, or scales, which are specific to Maelstrom Research. In this particular case, the tibble must also contain 'vocabulary_short', 'taxonomy_scale', 'vocabulary_scale' and 'term_scale' to work with some specific functions.
{
harmonized_dossier_evaluate(DEMO_files_harmo$harmonized_dossier)
}
#> - DOSSIER ASSESSMENT: ----------------------------------------------------
#> - DATA DICTIONARY ASSESSMENT: data_dict --------------
#> Assess the standard adequacy of naming
#> Assess the uniqueness of variable names
#> Assess the presence of possible duplicated columns
#> Assess the presence of empty rows in the data dictionary
#> Assess the presence of empty columns in the data dictionary
#> Assess the completion of `label(:xx)` column in 'Variables'
#> Assess the `valueType` column in 'Variables'
#> Generate report
#>
#> The data dictionary contains no error/warning.
#>
#> - WARNING MESSAGES (if any): --------------------------------------------
#>
#> - DATASET ASSESSMENT: dataset_MELBOURNE_1 --------------------------
#> Assess the standard adequacy of naming
#> Assess the presence of variable names both in dataset and data dictionary
#> Assess the presence of possible duplicated variable in the dataset
#> Assess the presence of duplicated participants in the dataset
#> Error in df_append(out, united, after = after): `after` must be a whole number, not an integer `NA`.
#> ℹ This is an internal error that was detected in the tidyr package.
#> Please report it at <https://github.com/tidyverse/tidyr/issues> with a reprex
#> (<https://tidyverse.org/help/>) and the full backtrace.