This function modifies a data dictionary by adding shortened labels for both variables and categories. The shortened labels are created based on specified maximum lengths for the variable and category names and labels. The function first validates the input using as_data_dict_shape and extracts the first variable and category labels using first_label_get. It then calculates the lengths of names and labels, ensuring that they do not exceed the specified maximum lengths. The function handles both variables and categories, creating short labels while replacing any missing values with "Empty".

data_dict_trim_labels(
  data_dict,
  max_length_var_name = 31,
  max_length_var_label = 255,
  max_length_cat_name = 15,
  max_length_cat_label_short = 15,
  max_length_cat_label_long = 63,
  .keep_columns = TRUE
)

Arguments

data_dict

A data dictionary, typically a list containing 'Variables' and 'Categories' data frames.

max_length_var_name

An integer specifying the maximum length for variable names (default is 10).

max_length_var_label

An integer specifying the maximum length for variable labels (default is 255).

max_length_cat_name

An integer specifying the maximum length for category names (default is 10).

max_length_cat_label_short

An integer specifying the maximum total length for category labels (short) (default is 15).

max_length_cat_label_long

An integer specifying the maximum total length for category labels (long) (default is 63).

.keep_columns

A boolean specifying if the output preserves the other columns of the data dictionary or not.

Value

A modified data dictionary with additional columns for shortened labels:

  • madshapR::label_var_short: Shortened variable labels.

  • madshapR::label_cat_long: Shortened category labels (if categories are present).

Details

A data dictionary contains the list of variables in a dataset and metadata about the variables and can be associated with a dataset. A data dictionary object is a list of data frame(s) named 'Variables' (required) and 'Categories' (if any). To be usable in any function, the data frame 'Variables' must contain at least the name column, with all unique and non-missing entries, and the data frame 'Categories' must contain at least the variable and name columns, with unique combination of variable and name.

Examples

{

 # use madshapR_examples provided by the package
 data_dict <- madshapR_examples$`data_dictionary_example - errors`
 data_dict_with_short_labels <- data_dict_trim_labels(data_dict)
 
 attributes(data_dict_with_short_labels)

}
#> $names
#> [1] "Variables"  "Categories"
#>