Title: | A Tidy Interface for Simulating Multivariate Data |
---|---|
Description: | Provides pipe-friendly (%>%) wrapper functions for MASS::mvrnorm() to create simulated multivariate data sets with groups of variables with different degrees of variance, covariance, and effect size. |
Authors: | Eric Scott [aut, cre] |
Maintainer: | Eric Scott <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.2.2.9000 |
Built: | 2024-11-20 04:21:16 UTC |
Source: | https://github.com/Aariq/holodeck |
Internally, this package uses the definition operator, :=
,
to make assignments that require computing on the LHS.
x |
An object to test. |
lhs , rhs
|
Expressions for the LHS and RHS of the definition. |
Pipe friendly wrapper to 'diag(x) <- value'
set_diag(x, value)
set_diag(x, value)
x |
a matrix |
value |
either a single value or a vector of length equal to the diagonal of 'x'. |
a matrix
library(dplyr) matrix(0,3,3) %>% set_diag(1)
library(dplyr) matrix(0,3,3) %>% set_diag(1)
This is a simple wrapper that creates a tibble of length 'n_obs' with a single column 'groups'. It will warn if there are fewer than three replicates per group.
sim_cat(.data = NULL, n_obs = NULL, n_groups, name = "group")
sim_cat(.data = NULL, n_obs = NULL, n_groups, name = "group")
.data |
An optional dataframe. If a dataframe is supplied, simulated categorical data will be added to the dataframe. Either '.data' or 'n_obs' must be supplied. |
n_obs |
Total number of observations/rows to simulate if '.data' is not supplied. |
n_groups |
How many groups or treatments to simulate. |
name |
The column name for the grouping variable. Defaults to "group". |
To-do:
- Make this optionally create multiple categorical variables as being nested or crossed or random
a tibble
Other multivariate normal functions:
sim_covar()
,
sim_discr()
df <- sim_cat(n_obs = 30, n_groups = 3)
df <- sim_cat(n_obs = 30, n_groups = 3)
Adds a group of variables (columns) with a given variance and covariance to a data frame or tibble
sim_covar(.data = NULL, n_obs = NULL, n_vars, var, cov, name = NA, seed = NA)
sim_covar(.data = NULL, n_obs = NULL, n_vars, var, cov, name = NA, seed = NA)
.data |
An optional dataframe. If a dataframe is supplied, simulated categorical data will be added to the dataframe. Either '.data' or 'n_obs' must be supplied. |
n_obs |
Total number of observations/rows to simulate if '.data' is not supplied. |
n_vars |
Number of variables to simulate. |
var |
Variance used to construct variance-covariance matrix. |
cov |
Covariance used to construct variance-covariance matrix. |
name |
An optional name to be appended to the column names in the output. |
seed |
An optional seed for random number generation. If 'NA' (default) a random seed will be used. |
a tibble
Other multivariate normal functions:
sim_cat()
,
sim_discr()
library(dplyr) sim_cat(n_obs = 30, n_groups = 3) %>% sim_covar(n_vars = 5, var = 1, cov = 0.5, name = "correlated")
library(dplyr) sim_cat(n_obs = 30, n_groups = 3) %>% sim_covar(n_vars = 5, var = 1, cov = 0.5, name = "correlated")
To-do: make this work with 'dplyr::group_by()' instead of 'group ='
sim_discr(.data, n_vars, var, cov, group_means, name = NA, seed = NA)
sim_discr(.data, n_vars, var, cov, group_means, name = NA, seed = NA)
.data |
A dataframe containing a grouping variable column. |
n_vars |
Number of variables to simulate. |
var |
Variance used to construct variance-covariance matrix. |
cov |
Covariance used to construct variance-covariance matrix. |
group_means |
A vector of the same length as the number of grouping variables. |
name |
An optional name to be appended to the column names in the output. |
seed |
An optional seed for random number generation. If 'NA' (default) a random seed will be used. |
a tibble
Other multivariate normal functions:
sim_cat()
,
sim_covar()
library(dplyr) sim_cat(n_obs = 30, n_groups = 3) %>% group_by(group) %>% sim_discr(n_vars = 5, var = 1, cov = 0.5, group_means = c(-1, 0, 1), name = "descr")
library(dplyr) sim_cat(n_obs = 30, n_groups = 3) %>% group_by(group) %>% sim_discr(n_vars = 5, var = 1, cov = 0.5, group_means = c(-1, 0, 1), name = "descr")
Takes a data frame and randomly replaces a user-supplied proportion of values with 'NA'.
sim_missing(.data, prop, seed = NA)
sim_missing(.data, prop, seed = NA)
.data |
A dataframe. |
prop |
Proportion of values to be set to 'NA'. |
seed |
An optional seed for random number generation. If 'NA' (default) a random seed will be used. |
a dataframe with NAs
library(dplyr) df <- sim_cat(n_obs = 10, n_groups = 2) %>% sim_covar(n_vars = 10, var = 1, cov = 0.5) %>% sim_missing(0.05)
library(dplyr) df <- sim_cat(n_obs = 10, n_groups = 2) %>% sim_covar(n_vars = 10, var = 1, cov = 0.5) %>% sim_missing(0.05)