Bug fixes and improvements

deprecated functions

To avoid confusion with help(function), the function madshapR_help() has been renamed madshapR_website().

Dependency changes

  • set a minimum dplyr dependence to avoid bugs

Bug fixes and improvements

Some of the tests were made with another package (Rmonize) which as “madshapR” as a dependence.

Enhance reports

  • in visual reports, void confusing changes in color scheme in visual reports.

  • Histograms for date variables display valid ranges.

  • in reports, change % NA as proportion in reports.

  • dossier_visualize() report shows variable labels in the same lang.

  • in visual reports, the bar plot only appears when there are multiple missing value types, otherwise only the pie chart is shown.

  • in reports, all of the percentages are now included under “Other values (non categorical)”, which gives a single value.

  • https://github.com/maelstrom-research/madshapR/issues/51

suppress overwrite parameter in dataset_visualize().

in dataset_summary() minor issue (consistency in column names and content).

Correct Data dictionary functions

enhance the function check_data_dict_valueType(), which was too slow.

valueType_adjust() now works with empty column (all NAs)

  • allow the format date to be transformed into text in dataset_zap_data_dict() when the format is unclear.

New functions

  • col_id() function which is a short cut for calling the attribute madshapR::col_id of a dataset.

  • as_category(),is_category(),drop_category() function which coerces a vector as a categorical object. Typically a column in a dataset that needs to be coerced into a categorical variable (The data dictionary is updated accordingly).

Deprecated functions

  • Rename and update example rda Object (in data) of DEMO_files into madshapR_DEMO for consistency across our other packages.

Creation of NEWS feed !!

Addition of NEWS.md for the development version use “(development version)”.

Bug fixes and improvements

  • Some improvements in the documentation of the package has been made.

  • internal call of libraries (using ::) has been replaced by proper import in the declaration function.

  • get functions in fabR have been changed in its last release. the functions using them as dependencies ( check_xxx()) have been updated accordingly.

  • DEMO files no longer include harmonization files that are now in the package harmonizR

Dependency changes

New Imports: haven, lifecycle

No longer in Imports: xfun

New functions

These functions are imported from fabR

This separation into 3 functions will allow future developments, such as render as a ppt or pdf.

deprecated functions

Due to another package development (see fabR), The function open_visual_report() has been deprecated in favor of bookdown_open() imported from fabR package.

This package is a collection of wrapper functions used in data pipelines.

This is still a work in progress, so please let us know if you used a function before and is not working any longer.

Helper functions

functions to generate, shape and format data.

These functions allows to create, extract transform data/meta data from a dataset. A dossier is a list of datasets.

  • evaluate and apply attributes:

as_dataset(), as_dossier() is_dataset(), is_dossier()

Functions to work with data types

These functions allow user to work with, extract or assign data type (valueType) to values and/or dataset.

as_valueType(), is_valueType(), valueType_adjust(), valueType_guess(), valueType_self_adjust(), valueType_of()

Unit tests and QA for datasets and data dictionaries

These helper functions evaluate content of a dataset and/or data dictionary to extract from them irregularities or potential errors. These informations are stored in a tibble that can be use to assess inputs.

check_data_dict_categories(), check_data_dict_missing_categories(), check_data_dict_taxonomy(), check_data_dict_variables(), check_data_dict_valueType(), check_dataset_categories(), check_dataset_valueType(), check_dataset_variables(), check_name_standards()

Summarize information in dataset and data dictionaries

These helper functions evaluate content of a dataset and/or data dictionary to extract from them summary statistics and elements such as missing values, NA, category names, etc. These informations are stored in a tibble that can be use to summary inputs.

dataset_preprocess(), summary_variables(), summary_variables_categorical(),summary_variables_date(), summary_variables_numeric(),summary_variables_text()

Write and read excel and csv

  • read_csv_any_formats() The csv file is read twice to detect the number of lines to use in attributing the column type (guess_max parameter of read_csv). This avoids common errors when reading csv files.

  • read_excel_allsheets() The Excel file is read and the values are placed in a list of tibbles, with each sheet in a separate element in the list. If the Excel file has only one sheet, the output is a single tibble.

  • write_excel_allsheets() Write all Excel sheets using xlsx::write.xlsx() recursively.

Plot and summary functions used in a visual report

plot_bar(), plot_box(), plot_date(), plot_density(), plot_histogram(), plot_main_word(), plot_pie_valid_value(), summary_category(), summary_numerical(),summary_text()