Subsets either or both the 'Variables' and 'Categories' elements of a data
dictionary. Rows are conserved if their values satisfy the condition.
This is a wrapper function analogous to dplyr::filter()
.
data_dict_filter(
data_dict,
filter_var = NULL,
filter_cat = NULL,
filter_all = NULL
)
A list of data frame(s) representing metadata to be filtered.
Expressions that are defined in the element 'Variables' in the data dictionary.
Expressions that are defined in the element 'Categories' in the data dictionary.
Expressions that are defined both in the 'Categories' and 'Variables' in the data dictionary.
A list of data frame(s) identifying a workable data dictionary structure.
A data dictionary contains the list of variables in a dataset and metadata
about the variables and can be associated with a dataset. A data dictionary
object is a list of data frame(s) named 'Variables' (required) and
'Categories' (if any). To be usable in any function, the data frame
'Variables' must contain at least the name
column, with all unique and
non-missing entries, and the data frame 'Categories' must contain at least
the variable
and name
columns, with unique combination of
variable
and name
.
{
# use madshapR_DEMO provided by the package
# Create a list of data dictionaries where the column 'table' is added to
# refer to the associated dataset. The object created is not a
# data dictionary per say, but can be used as a structure which can be
# shaped into a data dictionary.
library(dplyr)
data_dict_list <- list(
data_dict_1 <- madshapR_DEMO$data_dict_TOKYO ,
data_dict_2 <- madshapR_DEMO$data_dict_MELBOURNE)
names(data_dict_list) = c("dataset_TOKYO","dataset_MELBOURNE")
data_dict_nest <- data_dict_list_nest(data_dict_list, name_group = 'table')
###### Example 1 search and filter through a column in 'Variables' element
data_dict_filter(data_dict_nest,filter_var = "valueType == 'text'")
###### Example 2 search and filter through a column in 'Categories' element
data_dict_filter(data_dict_nest,filter_cat = "missing == TRUE")
###### Example 3 search and filter through* a column in 'Variables' element.
# The column must exist in both 'Variables' and 'Categories' and have the
# same meaning
data_dict_filter(data_dict_nest,filter_all = "table == 'dataset_TOKYO'")
}
#> $Variables
#> # A tibble: 9 × 7
#> table index name `label:en` `description:en` valueType unit
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 dataset_TOKYO 1 part_id id of the part… id of the parti… text NA
#> 2 dataset_TOKYO 2 gndr gndr gender of the p… text NA
#> 3 dataset_TOKYO 3 height height height of the p… integer cm
#> 4 dataset_TOKYO 4 weight_ms weight_ms weight of the p… integer kg
#> 5 dataset_TOKYO 5 weight_dc weight_dc weight of the p… decimal kg
#> 6 dataset_TOKYO 6 dob dob date of birth o… date years
#> 7 dataset_TOKYO 7 prg_ever prg_ever whether the par… integer NA
#> 8 dataset_TOKYO 8 empty empty empty column integer NA
#> 9 dataset_TOKYO 9 opentext opentext open text text NA
#>
#> $Categories
#> # A tibble: 11 × 5
#> table variable name `label:en` missing
#> <chr> <chr> <chr> <chr> <chr>
#> 1 dataset_TOKYO gndr Male Male FALSE
#> 2 dataset_TOKYO gndr Female Female FALSE
#> 3 dataset_TOKYO gndr -77 Don’t want to answer TRUE
#> 4 dataset_TOKYO weight_ms -88 Don’t want to answer TRUE
#> 5 dataset_TOKYO weight_ms -99 Don’t know TRUE
#> 6 dataset_TOKYO prg_ever 0 never pregnant FALSE
#> 7 dataset_TOKYO prg_ever 1 pregnant once or more FALSE
#> 8 dataset_TOKYO prg_ever 2 currently pregnant FALSE
#> 9 dataset_TOKYO prg_ever 8 Don’t want to answer TRUE
#> 10 dataset_TOKYO prg_ever 9 Don’t know TRUE
#> 11 dataset_TOKYO prg_ever -7 not applicable TRUE
#>