Subsets either or both the 'Variables' and 'Categories' elements of a data dictionary. Rows are conserved if their values satisfy the condition. This is a wrapper function analogous to dplyr::filter().

data_dict_filter(
  data_dict,
  filter_var = NULL,
  filter_cat = NULL,
  filter_all = NULL
)

Arguments

data_dict

A list of data frame(s) representing metadata to be filtered.

filter_var

Expressions that are defined in the element 'Variables' in the data dictionary.

filter_cat

Expressions that are defined in the element 'Categories' in the data dictionary.

filter_all

Expressions that are defined both in the 'Categories' and 'Variables' in the data dictionary.

Value

A list of data frame(s) identifying a workable data dictionary structure.

Details

A data dictionary contains the list of variables in a dataset and metadata about the variables and can be associated with a dataset. A data dictionary object is a list of data frame(s) named 'Variables' (required) and 'Categories' (if any). To be usable in any function, the data frame 'Variables' must contain at least the name column, with all unique and non-missing entries, and the data frame 'Categories' must contain at least the variable and name columns, with unique combination of variable and name.

See also

Examples

{

# use madshapR_DEMO provided by the package

# Create a list of data dictionaries where the column 'table' is added to 
# refer to the associated dataset. The object created is not a 
# data dictionary per say, but can be used as a structure which can be 
# shaped into a data dictionary.
library(dplyr)

data_dict_list <- list(
  data_dict_1 <- madshapR_DEMO$data_dict_TOKYO ,
  data_dict_2 <- madshapR_DEMO$data_dict_MELBOURNE)
names(data_dict_list) = c("dataset_TOKYO","dataset_MELBOURNE")

data_dict_nest <- data_dict_list_nest(data_dict_list, name_group = 'table')

###### Example 1 search and filter through a column in 'Variables' element
data_dict_filter(data_dict_nest,filter_var = "valueType == 'text'")

###### Example 2 search and filter through a column in 'Categories' element
data_dict_filter(data_dict_nest,filter_cat = "missing == TRUE")

###### Example 3 search and filter through* a column in 'Variables' element.
# The column must exist in both 'Variables' and 'Categories' and have the
# same meaning
data_dict_filter(data_dict_nest,filter_all = "table == 'dataset_TOKYO'")

}
#> $Variables
#> # A tibble: 9 × 7
#>   table         index name      `label:en`      `description:en` valueType unit 
#>   <chr>         <chr> <chr>     <chr>           <chr>            <chr>     <chr>
#> 1 dataset_TOKYO 1     part_id   id of the part… id of the parti… text      NA   
#> 2 dataset_TOKYO 2     gndr      gndr            gender of the p… text      NA   
#> 3 dataset_TOKYO 3     height    height          height of the p… integer   cm   
#> 4 dataset_TOKYO 4     weight_ms weight_ms       weight of the p… integer   kg   
#> 5 dataset_TOKYO 5     weight_dc weight_dc       weight of the p… decimal   kg   
#> 6 dataset_TOKYO 6     dob       dob             date of birth o… date      years
#> 7 dataset_TOKYO 7     prg_ever  prg_ever        whether the par… integer   NA   
#> 8 dataset_TOKYO 8     empty     empty           empty column     integer   NA   
#> 9 dataset_TOKYO 9     opentext  opentext        open text        text      NA   
#> 
#> $Categories
#> # A tibble: 11 × 5
#>    table         variable  name   `label:en`            missing
#>    <chr>         <chr>     <chr>  <chr>                 <chr>  
#>  1 dataset_TOKYO gndr      Male   Male                  FALSE  
#>  2 dataset_TOKYO gndr      Female Female                FALSE  
#>  3 dataset_TOKYO gndr      -77    Don’t want to answer  TRUE   
#>  4 dataset_TOKYO weight_ms -88    Don’t want to answer  TRUE   
#>  5 dataset_TOKYO weight_ms -99    Don’t know            TRUE   
#>  6 dataset_TOKYO prg_ever  0      never pregnant        FALSE  
#>  7 dataset_TOKYO prg_ever  1      pregnant once or more FALSE  
#>  8 dataset_TOKYO prg_ever  2      currently pregnant    FALSE  
#>  9 dataset_TOKYO prg_ever  8      Don’t want to answer  TRUE   
#> 10 dataset_TOKYO prg_ever  9      Don’t know            TRUE   
#> 11 dataset_TOKYO prg_ever  -7     not applicable        TRUE   
#>