Groups the data dictionary element(s) by the groups defined by the query. This function groups both the 'Variables' and 'Categories' elements (if the group exists under the same definition in in both). This function is analogous to running dplyr::group_by(). Each element is named using the group values. data_dict_ungroup() reverses the effect.

data_dict_group_by(data_dict, col)

Arguments

data_dict

A list of data frame(s) representing metadata to be transformed.

col

variable to group by.

Value

A list of data frame(s) identifying a workable data dictionary structure.

Details

A data dictionary contains the list of variables in a dataset and metadata about the variables and can be associated with a dataset. A data dictionary object is a list of data frame(s) named 'Variables' (required) and 'Categories' (if any). To be usable in any function, the data frame 'Variables' must contain at least the name column, with all unique and non-missing entries, and the data frame 'Categories' must contain at least the variable and name columns, with unique combination of variable and name.

Examples

{

# use madshapR_DEMO provided by the package
# Create a list of data dictionaries where the column 'table' is added to 
# refer to the associated dataset. The object created is not a 
# data dictionary per say, but can be used as a structure which can be 
# shaped into a data dictionary.

data_dict_list <- list(
  data_dict_1 <- madshapR_DEMO$data_dict_TOKYO ,
  data_dict_2 <- madshapR_DEMO$data_dict_MELBOURNE)
names(data_dict_list) = c("dataset_TOKYO","dataset_MELBOURNE")

data_dict_nest <- data_dict_list_nest(data_dict_list, name_group = 'table')

data_dict_group_by(data_dict_nest, col = "table")

}
#> $Variables
#> # A tibble: 15 × 7
#> # Groups:   table [2]
#>    table             index name      `label:en` `description:en` valueType unit 
#>    <chr>             <chr> <chr>     <chr>      <chr>            <chr>     <chr>
#>  1 dataset_MELBOURNE 1     id        id         id of the parti… text      NA   
#>  2 dataset_MELBOURNE 2     Gender    Gender     Gender           integer   NA   
#>  3 dataset_MELBOURNE 3     BMI       BMI        Body Mass Index  decimal   kg/m…
#>  4 dataset_MELBOURNE 4     age       age        Age of Particip… integer   years
#>  5 dataset_MELBOURNE 5     smo_stat… smo_status Whether the par… integer   NA   
#>  6 dataset_MELBOURNE 6     prg_curr  prg_curr   Are you current… integer   NA   
#>  7 dataset_TOKYO     1     part_id   id of the… id of the parti… text      NA   
#>  8 dataset_TOKYO     2     gndr      gndr       gender of the p… text      NA   
#>  9 dataset_TOKYO     3     height    height     height of the p… integer   cm   
#> 10 dataset_TOKYO     4     weight_ms weight_ms  weight of the p… integer   kg   
#> 11 dataset_TOKYO     5     weight_dc weight_dc  weight of the p… decimal   kg   
#> 12 dataset_TOKYO     6     dob       dob        date of birth o… date      years
#> 13 dataset_TOKYO     7     prg_ever  prg_ever   whether the par… integer   NA   
#> 14 dataset_TOKYO     8     empty     empty      empty column     integer   NA   
#> 15 dataset_TOKYO     9     opentext  opentext   open text        text      NA   
#> 
#> $Categories
#> # A tibble: 23 × 5
#> # Groups:   table [2]
#>    table             variable   name  `label:en`             missing
#>    <chr>             <chr>      <chr> <chr>                  <chr>  
#>  1 dataset_MELBOURNE age        -888  don't want to answer   TRUE   
#>  2 dataset_MELBOURNE Gender     1     Male                   FALSE  
#>  3 dataset_MELBOURNE Gender     2     Female                 FALSE  
#>  4 dataset_MELBOURNE prg_curr   0     not currently pregnant FALSE  
#>  5 dataset_MELBOURNE prg_curr   1     currently pregnant     FALSE  
#>  6 dataset_MELBOURNE prg_curr   8     Don’t want to answer   TRUE   
#>  7 dataset_MELBOURNE prg_curr   9     Don’t know             TRUE   
#>  8 dataset_MELBOURNE prg_curr   -77   not applicable         TRUE   
#>  9 dataset_MELBOURNE smo_status 1     never smoked           FALSE  
#> 10 dataset_MELBOURNE smo_status 2     current smoker         FALSE  
#> # ℹ 13 more rows
#>