validate - Data Validation Infrastructure
Declare data validation rules and data quality indicators; confront data with them and analyze or visualize the results. The package supports rules that are per-field, in-record, cross-record or cross-dataset. Rules can be automatically analyzed for rule type and connectivity. Supports checks implied by an SDMX DSD file as well. See also Van der Loo and De Jonge (2018) <doi:10.1002/9781118897126>, Chapter 6 and the JSS paper (2021) <doi:10.18637/jss.v097.i10>.
Last updated 2 months ago
data-cleaningvalidation
12.36 score 406 stars 9 packages 444 scripts 2.6k downloadstinytest - Lightweight and Feature Complete Unit Testing Framework
Provides a lightweight (zero-dependency) and easy to use unit testing framework. Main features: install tests with the package. Test results are treated as data that can be stored and manipulated. Test files are R scripts interspersed with test commands, that can be programmed over. Fully automated build-install-test sequence for packages. Skip tests when not run locally (e.g. on CRAN). Flexible and configurable output printing. Compare computed output with output stored with the package. Run tests in parallel. Extensible by other packages. Report side effects.
Last updated 18 days ago
12.35 score 223 stars 7 packages 624 scripts 20k downloadsgower - Gower's Distance
Compute Gower's distance (or similarity) coefficient between records. Compute the top-n matches between records. Core algorithms are executed in parallel on systems supporting OpenMP.
Last updated 5 months ago
11.11 score 29 stars 371 packages 64 scripts 126k downloadssimputation - Simple Imputation
Easy to use interfaces to a number of imputation methods that fit in the not-a-pipe operator of the 'magrittr' package.
Last updated 4 months ago
data-scienceimputationofficialstatistics
8.54 score 91 stars 332 scripts 1.4k downloadslumberjack - Track Changes in Data
A framework that allows for easy logging of changes in data. Main features: start tracking changes by adding a single line of code to an existing script. Track changes in multiple datasets, using multiple loggers. Add custom-built loggers or use loggers offered by other packages. <doi:10.18637/jss.v098.i01>.
Last updated 5 months ago
daffdatascienceloggingreproducible-research
7.09 score 63 stars 1 packages 65 scripts 396 downloadsdcmodify - Modify Data Using Externally Defined Modification Rules
Data cleaning scripts typically contain a lot of 'if this change that' type of statements. Such statements are typically condensed expert knowledge. With this package, such 'data modifying rules' are taken out of the code and become in stead parameters to the work flow. This allows one to maintain, document, and reason about data modification rules as separate entities.
Last updated 5 months ago
6.22 score 10 stars 55 scripts 314 downloadslintools - Manipulation of Linear Systems of (in)Equalities
Variable elimination (Gaussian elimination, Fourier-Motzkin elimination), Moore-Penrose pseudoinverse, reduction to reduced row echelon form, value substitution, projecting a vector on the convex polytope described by a system of (in)equations, simplify systems by removing spurious columns and rows and collapse implied equalities, test if a matrix is totally unimodular, compute variable ranges implied by linear (in)equalities.
Last updated 5 months ago
5.19 score 4 stars 2 packages 13 scripts 466 downloadsaccumulate - Split-Apply-Combine with Dynamic Groups
Estimate group aggregates, where one can set user-defined conditions that each group of records must satisfy to be suitable for aggregation. If a group of records is not suitable, it is expanded using a collapsing scheme defined by the user.
Last updated 5 months ago
5.13 score 9 stars 2 scripts 313 downloadsextremevalues - Univariate Outlier Detection
Detect outliers in one-dimensional data.
Last updated 9 months ago
4.60 score 2 packages 29 scripts 23k downloadssynthesizer - Synthesize Data Based on Empirical Quantile Functions and Rank Order Matching
Data is synthesized using a combination of inverse transform sampling from the empirical quantile functions for each variable, and then copying the rank order structure from the original dataset. The package also includes a number of functions to measure the utility of synthesized datasets.
Last updated 1 months ago
4.18 score 7 scripts 277 downloadsdeducorrect - Deductive Correction, Deductive Imputation, and Deterministic Correction
A collection of methods for automated data cleaning where all actions are logged. NOTE: active development has moved to the 'deductive' package.
Last updated 5 months ago
4.11 score 8 stars 32 scripts 447 downloadshashr - Hash R Objects to Integers Fast
Apply an adaptation of the SuperFastHash algorithm to any R object. Hash whole R objects or, for vectors or lists, hash R objects to obtain a set of hash values that is stored in a structure equivalent to the input. See <http://www.azillionmonkeys.com/qed/hash.html> for a description of the hash algorithm.
Last updated 5 months ago
3.88 score 8 stars 19 scripts 245 downloadsdeductive - Data Correction and Imputation Using Deductive Methods
Attempt to repair inconsistencies and missing values in data records by using information from valid values and validation rules restricting the data.
Last updated 5 months ago
data-cleaning
3.81 score 13 stars 9 scripts 365 downloadsrspa - Adapt Numerical Records to Fit (in)Equality Restrictions
Minimally adjust the values of numerical records in a data.frame, such that each record satisfies a predefined set of equality and/or inequality constraints. The constraints can be defined using the 'validate' package. The core algorithms have recently been moved to the 'lintools' package, refer to 'lintools' for a more basic interface and access to a version of the algorithm that works with sparse matrices.
Last updated 5 months ago
3.45 score 3 stars 19 scripts 304 downloads