R/03-dataset_functions.R
as_dossier.Rd
Checks if an object is a valid dossier (list of datasets) and returns it
with the appropriate madshapR::class
attribute. This function mainly helps
validate inputs within other functions of the package but could be used to
check if a dossier is valid.
as_dossier(object)
A potential dossier object to be coerced.
A list of data frame(s) with madshapR::class
'dossier'.
A dossier is a named list containing at least one data frame or more, each of them being datasets. The name of each tibble will be use as the reference name of the dataset.
A dataset is a data table containing variables. A dataset object is a data frame and can be associated with a data dictionary. If no data dictionary is provided with a dataset, a minimum workable data dictionary will be generated as needed within relevant functions. Identifier variable(s) for indexing can be specified by the user. The id values must be non-missing and will be used in functions that require it. If no identifier variable is specified, indexing is handled automatically by the function.
For a better assessment, please use dataset_evaluate()
.
{
# use madshapR_DEMO provided by the package
library(dplyr)
library(stringr)
###### Example 1: a dataset list is a dossier by definition.
dossier <-
as_dossier(madshapR_DEMO[str_detect(names(madshapR_DEMO),"dataset_TOKYO")])
glimpse(dossier)
###### Example 2: any list of data frame can be a dossier by
# definition.
glimpse(as_dossier(list(dataset_1 = iris, dataset_2 = mtcars)))
}
#> List of 2
#> $ dataset_TOKYO : tibble [50 × 9] (S3: tbl_df/tbl/data.frame)
#> ..$ part_id : chr [1:50] "ID001" "ID002" "ID003" "ID004" ...
#> ..$ gndr : chr [1:50] "Male" "Female" "Female" "Female" ...
#> ..$ height : num [1:50] 191 176 154 167 185 171 185 171 169 179 ...
#> ..$ weight_ms: num [1:50] 63 NA NA -88 NA 57 NA NA 52 NA ...
#> ..$ weight_dc: num [1:50] NA 65 121 NA 45 NA 58 59 NA 62 ...
#> ..$ dob : chr [1:50] "3/22/1990" "8/15/2001" "12/17/1996" "6/13/1990" ...
#> ..$ prg_ever : num [1:50] -7 0 2 1 8 -7 9 2 -7 -7 ...
#> ..$ empty : logi [1:50] NA NA NA NA NA NA ...
#> ..$ opentext : chr [1:50] "All children, except one, grow up. They soon know that they will" "grow up, and the way Wendy knew was this. One day when she was two" "years old she was playing in a garden, and she plucked another" "flower and ran with it to her mother. I suppose she must have looked" ...
#> ..- attr(*, "madshapR::class")= chr "dataset"
#> $ dataset_TOKYO - errors with data: tibble [50 × 9] (S3: tbl_df/tbl/data.frame)
#> ..$ part_id : chr [1:50] "ID001" "ID002" "ID003" "ID004" ...
#> ..$ gndr : chr [1:50] "Male" "Female" "Female" "Female" ...
#> ..$ height : num [1:50] 191 191 191 191 191 191 191 NA 191 191 ...
#> ..$ weight_ms: num [1:50] -7 0 8 1 8 -8 9 NA -7 -7 ...
#> ..$ weight_dc: num [1:50] NA 65.3 45 NA 45 NA 58 NA NA 65.3 ...
#> ..$ dob : chr [1:50] "3/22/1990" "8/15/2001" "12/17/1996" "6/13/1990" ...
#> ..$ prg_ever : num [1:50] -7 0 8 1 8 -8 9 NA -7 -7 ...
#> ..$ empty : logi [1:50] NA NA NA NA NA NA ...
#> ..$ opentext : chr [1:50] "All children, except one, grow up. They soon know that they will" "grow up, and the way Wendy knew was this. One day when she was two" "rather delightful, for Mrs. Darling put her hand to her heart and" "flower and ran with it to her mother. I suppose she must have looked" ...
#> ..- attr(*, "madshapR::class")= chr "dataset"
#> - attr(*, "madshapR::class")= chr "dossier"
#> List of 2
#> $ dataset_1:'data.frame': 150 obs. of 5 variables:
#> ..$ Sepal.Length: num [1:150] 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
#> ..$ Sepal.Width : num [1:150] 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
#> ..$ Petal.Length: num [1:150] 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
#> ..$ Petal.Width : num [1:150] 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
#> ..$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
#> ..- attr(*, "madshapR::class")= chr "dataset"
#> $ dataset_2:'data.frame': 32 obs. of 11 variables:
#> ..$ mpg : num [1:32] 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
#> ..$ cyl : num [1:32] 6 6 4 6 8 6 8 4 4 6 ...
#> ..$ disp: num [1:32] 160 160 108 258 360 ...
#> ..$ hp : num [1:32] 110 110 93 110 175 105 245 62 95 123 ...
#> ..$ drat: num [1:32] 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
#> ..$ wt : num [1:32] 2.62 2.88 2.32 3.21 3.44 ...
#> ..$ qsec: num [1:32] 16.5 17 18.6 19.4 17 ...
#> ..$ vs : num [1:32] 0 0 1 1 0 1 0 1 1 1 ...
#> ..$ am : num [1:32] 1 1 1 0 0 0 0 0 0 0 ...
#> ..$ gear: num [1:32] 4 4 4 3 3 3 3 4 4 4 ...
#> ..$ carb: num [1:32] 4 4 1 1 2 1 4 2 2 4 ...
#> ..- attr(*, "madshapR::class")= chr "dataset"
#> - attr(*, "madshapR::class")= chr "dossier"