Generates a data frame report of any categorical variable name present in the 'Categories' element but not present in 'Variables'. The data frame also reports any non-unique combinations of 'variable' and 'name' in the 'Categories' element. This report can be used to help assess data structure, presence of fields, coherence across elements, and taxonomy or data dictionary formats.

check_data_dict_categories(data_dict)

Arguments

data_dict

A list of data frame(s) representing metadata to be evaluated.

Value

A data frame providing categorical variables that has issues within a data dictionary.

Details

A data dictionary contains the list of variables in a dataset and metadata about the variables and can be associated with a dataset. A data dictionary object is a list of data frame(s) named 'Variables' (required) and 'Categories' (if any). To be usable in any function, the data frame 'Variables' must contain at least the name column, with all unique and non-missing entries, and the data frame 'Categories' must contain at least the variable and name columns, with unique combination of variable and name.

Examples

{

# use madshapR_examples provided by the package
data_dict <- madshapR_examples$`data_dictionary_example - errors`
check_data_dict_categories(data_dict)

}
#> # A tibble: 5 × 4
#>   name_var  col_name value          condition                                   
#>   <chr>     <chr>    <chr>          <chr>                                       
#> 1 prg_ever  variable Row number: 7  [ERROR] - Category 'variable' name has no c…
#> 2 prg_ever  name     Row number: 12 [ERROR] - Category 'name' is empty.         
#> 3 weight_sm variable Row number: 6  [ERROR] - Category 'variable' name has no c…
#> 4 (empty)   variable Row number: 10 [ERROR] - Category 'variable' name is empty.
#> 5 prg_ever  name     -7             [ERROR] - Duplicated category 'name'.