Transform column(s) of a data dictionary from wide format to long format

Transforms column(s) of a data dictionary from wide format to long format. If a taxonomy is provided, the corresponding columns in the data dictionary will be converted to a standardized format with fewer columns. This operation is equivalent to performing a tidyr::pivot_longer() on these columns following the taxonomy structure provided. Variable names in the data dictionary must be unique.

data_dict_pivot_longer(data_dict, taxonomy = NULL)

Arguments

data_dict: A list of data frame(s) representing metadata to be transformed.
taxonomy: An optional data frame identifying a variable classification schema.

Value

A list of data frame(s) identifying a data dictionary.

Details

A data dictionary contains the list of variables in a dataset and metadata about the variables and can be associated with a dataset. A data dictionary object is a list of data frame(s) named 'Variables' (required) and 'Categories' (if any). To be usable in any function, the data frame 'Variables' must contain at least the name column, with all unique and non-missing entries, and the data frame 'Categories' must contain at least the variable and name columns, with unique combination of variable and name.

A taxonomy is a classification schema that can be defined for variable attributes. A taxonomy is usually extracted from an Opal environment, and a taxonomy object is a data frame that must contain at least the columns taxonomy, vocabulary, and terms. Additional details about Opal taxonomies are available online.

Examples

{

library(dplyr)

# use madshapR_examples provided by the package
data_dict <- madshapR_examples$`data_dictionary_example`
taxonomy  <- madshapR_examples$`taxonomy_example`
data_dict_longer <- data_dict_pivot_longer(data_dict, taxonomy)

glimpse(data_dict_longer)

}
#> List of 2
#>  $ Variables : tibble [9 × 10] (S3: tbl_df/tbl/data.frame)
#>   ..$ index                 : num [1:9] 1 2 3 4 5 6 7 8 9
#>   ..$ name                  : chr [1:9] "part_id" "gndr" "height" "weight_ms" ...
#>   ..$ label:en              : chr [1:9] "id of the participant" "gndr" "height" "weight_ms" ...
#>   ..$ description:en        : chr [1:9] "id of the participant" "gender of the participant" "height of the participant" "weight of the participant - measured" ...
#>   ..$ valueType             : chr [1:9] "text" "integer" "integer" "integer" ...
#>   ..$ unit                  : chr [1:9] NA NA "cm" "kg" ...
#>   ..$ datacollection::1     : chr [1:9] "type" "type" "type" "level" ...
#>   ..$ datacollection::1.term: chr [1:9] "automatic" "declared" "declared" "moderate" ...
#>   ..$ datacollection::2     : chr [1:9] "level" "level" "level" "type" ...
#>   ..$ datacollection::2.term: chr [1:9] "high" "high" "moderate" "measured" ...
#>  $ Categories: tibble [11 × 4] (S3: tbl_df/tbl/data.frame)
#>   ..$ variable: chr [1:11] "gndr" "gndr" "gndr" "weight_ms" ...
#>   ..$ name    : chr [1:11] "1" "2" "-77" "-88" ...
#>   ..$ label:en: chr [1:11] "Male" "Female" "Don’t want to answer" "Don’t want to answer" ...
#>   ..$ missing : logi [1:11] FALSE FALSE TRUE TRUE TRUE FALSE ...

Transform column(s) of a data dictionary from wide format to long format

Arguments

Value

Details

See also

Examples