R/02-dictionaries_functions.R
data_dict_group_by.Rd
Groups the data dictionary element(s) by the groups defined by the query.
This function groups both the 'Variables' and 'Categories' elements (if
the group exists under the same definition in in both). This function is
analogous to running dplyr::group_by()
. Each element is named using the
group values. data_dict_ungroup()
reverses the effect.
data_dict_group_by(data_dict, col)
A list of data frame(s) representing metadata to be transformed.
variable to group by.
A list of data frame(s) identifying a workable data dictionary structure.
A data dictionary contains the list of variables in a dataset and metadata
about the variables and can be associated with a dataset. A data dictionary
object is a list of data frame(s) named 'Variables' (required) and
'Categories' (if any). To be usable in any function, the data frame
'Variables' must contain at least the name
column, with all unique and
non-missing entries, and the data frame 'Categories' must contain at least
the variable
and name
columns, with unique combination of
variable
and name
.
{
# use madshapR_DEMO provided by the package
# Create a list of data dictionaries where the column 'table' is added to
# refer to the associated dataset. The object created is not a
# data dictionary per say, but can be used as a structure which can be
# shaped into a data dictionary.
data_dict_list <- list(
data_dict_1 <- madshapR_DEMO$data_dict_TOKYO ,
data_dict_2 <- madshapR_DEMO$data_dict_MELBOURNE)
names(data_dict_list) = c("dataset_TOKYO","dataset_MELBOURNE")
data_dict_nest <- data_dict_list_nest(data_dict_list, name_group = 'table')
data_dict_group_by(data_dict_nest, col = "table")
}
#> $Variables
#> # A tibble: 15 × 7
#> # Groups: table [2]
#> table index name `label:en` `description:en` valueType unit
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 dataset_MELBOURNE 1 id id id of the parti… text NA
#> 2 dataset_MELBOURNE 2 Gender Gender Gender integer NA
#> 3 dataset_MELBOURNE 3 BMI BMI Body Mass Index decimal kg/m…
#> 4 dataset_MELBOURNE 4 age age Age of Particip… integer years
#> 5 dataset_MELBOURNE 5 smo_stat… smo_status Whether the par… integer NA
#> 6 dataset_MELBOURNE 6 prg_curr prg_curr Are you current… integer NA
#> 7 dataset_TOKYO 1 part_id id of the… id of the parti… text NA
#> 8 dataset_TOKYO 2 gndr gndr gender of the p… text NA
#> 9 dataset_TOKYO 3 height height height of the p… integer cm
#> 10 dataset_TOKYO 4 weight_ms weight_ms weight of the p… integer kg
#> 11 dataset_TOKYO 5 weight_dc weight_dc weight of the p… decimal kg
#> 12 dataset_TOKYO 6 dob dob date of birth o… date years
#> 13 dataset_TOKYO 7 prg_ever prg_ever whether the par… integer NA
#> 14 dataset_TOKYO 8 empty empty empty column integer NA
#> 15 dataset_TOKYO 9 opentext opentext open text text NA
#>
#> $Categories
#> # A tibble: 23 × 5
#> # Groups: table [2]
#> table variable name `label:en` missing
#> <chr> <chr> <chr> <chr> <chr>
#> 1 dataset_MELBOURNE age -888 don't want to answer TRUE
#> 2 dataset_MELBOURNE Gender 1 Male FALSE
#> 3 dataset_MELBOURNE Gender 2 Female FALSE
#> 4 dataset_MELBOURNE prg_curr 0 not currently pregnant FALSE
#> 5 dataset_MELBOURNE prg_curr 1 currently pregnant FALSE
#> 6 dataset_MELBOURNE prg_curr 8 Don’t want to answer TRUE
#> 7 dataset_MELBOURNE prg_curr 9 Don’t know TRUE
#> 8 dataset_MELBOURNE prg_curr -77 not applicable TRUE
#> 9 dataset_MELBOURNE smo_status 1 never smoked FALSE
#> 10 dataset_MELBOURNE smo_status 2 current smoker FALSE
#> # ℹ 13 more rows
#>