Assesses the content and structure of a harmonized dossier object (list of harmonized datasets) and reports possible issues in the datasets and data dictionaries to facilitate assessment of input data. The report can be used to help assess data structure, presence of fields, coherence across elements, and taxonomy or data dictionary formats. This report is compatible with Excel and can be exported as an Excel spreadsheet.

harmonized_dossier_evaluate(harmonized_dossier, taxonomy = NULL)

Arguments

harmonized_dossier

List of tibble(s), each of them being harmonized dataset.

taxonomy

A tibble identifying the scheme used for variables classification.

Value

A list of report(s), each of them being tibble(s) ('Overview and summary) for each harmonized dataset.

Details

A harmonized dossier must be a named list containing at least one data frame or data frame extension (e.g. a tibble), each of them being harmonized dataset(s). It is generally the product of applying harmonization processing to a dossier object. The name of each tibble will be use as the reference name of the dataset. A harmonized dossier has four attributes : harmonizR::class which is ""harmonized_dossier"" ; harmonizR::Dataschema (provided by user) ; harmonizR::data processing elements ; harmonizR::harmonized_col_id (provided by user) which refers to the column in each dataset which identifies unique combination observation/dataset. This id column name is the same across the dataset(s), the DataSchema and the data processing elements (created by using 'id_creation') and is used to initiate the process of harmonization.

A taxonomy is classification scheme that can be defined for variable attributes. If defined, a taxonomy must be a data frame like object. It must be compatible with (and is generally extracted from) an Opal environment. To work with certain functions, a valid taxonomy must contain at least the columns 'taxonomy', 'vocabulary', and 'terms'. In addition, the taxonomy may follow Maelstrom research taxonomy, and its content can be evaluated accordingly, such as naming convention restriction, tagging elements, or scales, which are specific to Maelstrom Research. In this particular case, the tibble must also contain 'vocabulary_short', 'taxonomy_scale', 'vocabulary_scale' and 'term_scale' to work with some specific functions.

Examples

{

harmonized_dossier_evaluate(DEMO_files_harmo$harmonized_dossier)

}
#> - DOSSIER ASSESSMENT: ----------------------------------------------------
#> - DATA DICTIONARY ASSESSMENT: data_dict --------------
#>     Assess the standard adequacy of naming
#>     Assess the uniqueness of variable names
#>     Assess the presence of possible duplicated columns
#>     Assess the presence of empty rows in the data dictionary
#>     Assess the presence of empty columns in the data dictionary
#>     Assess the completion of `label(:xx)` column in 'Variables'
#>     Assess the `valueType` column in 'Variables'
#>     Generate report
#> 
#>     The data dictionary contains no error/warning.
#> 
#>   - WARNING MESSAGES (if any): --------------------------------------------
#> 
#> - DATASET ASSESSMENT: dataset_MELBOURNE_1 --------------------------
#>     Assess the standard adequacy of naming
#>     Assess the presence of variable names both in dataset and data dictionary
#>     Assess the presence of possible duplicated variable in the dataset
#>     Assess the presence of duplicated participants in the dataset
#> Error in df_append(out, united, after = after): `after` must be a whole number, not an integer `NA`.
#>  This is an internal error that was detected in the tidyr package.
#>   Please report it at <https://github.com/tidyverse/tidyr/issues> with a reprex
#>   (<https://tidyverse.org/help/>) and the full backtrace.