Simulate from continuous outcome model given covariates

Simulate from continuous outcome model with mean $$g(\text{par}^\top X)$$ where $X$ is the design matrix specified by the formula, and $g$ is the link function specified by the family argument

Usage

outcome_continuous(
  data,
  mean = NULL,
  par = NULL,
  sd = 1,
  het = 0,
  outcome.name = "y",
  remove = c("id", "num"),
  family = gaussian(),
  ...
)

Arguments

data: (data.table) Covariate data, usually the output of the covariate model of a Trial object.
mean: (formula, function) Either a formula specifying the design from 'data' or a function that maps data to the conditional mean value on the link scale (see examples). If NULL all main-effects of the covariates will be used, except columns that are defined via the remove argument.
par: (numeric) Regression coefficients (default zero). Can be given as a named list corresponding to the column names of model.matrix
sd: (numeric) standard deviation of Gaussian measurement error
het: Introduce variance hetereogeneity by adding a residual term $het \cdot \mu_x \cdot e$, where $\mu_x$ is the mean given covariates and $e$ is an independent standard normal distributed variable. This term is in addition to the measurement error introduced by the sd argument.
outcome.name: Name of outcome variable ("y")
remove: Variables that will be removed from input data (if formula is not specified).
family: exponential family (default gaussian(identity))
...: Additional arguments passed to mean function (see examples)

Value

data.table

Examples

trial <- Trial$new(
  covariates = \(n) data.frame(a = rbinom(n, 1, 0.5), x = rnorm(n)),
  outcome = outcome_continuous
)
est <- function(data) glm(y ~ a + x, data = data)
trial$simulate(1e4, mean = ~ 1 + a + x, par = c(1, 0.5, 2)) |> est()
#> 
#> Call:  glm(formula = y ~ a + x, data = data)
#> 
#> Coefficients:
#> (Intercept)            a            x  
#>      0.9991       0.5112       1.9935  
#> 
#> Degrees of Freedom: 9999 Total (i.e. Null);  9997 Residual
#> Null Deviance:	    50740 
#> Residual Deviance: 9974 	AIC: 28360

# default behavior is to set all regression coefficients to 0
trial$simulate(1e4, mean = ~ 1 + a + x) |> est()
#> 
#> Call:  glm(formula = y ~ a + x, data = data)
#> 
#> Coefficients:
#> (Intercept)            a            x  
#>    0.020756    -0.035260     0.004824  
#> 
#> Degrees of Freedom: 9999 Total (i.e. Null);  9997 Residual
#> Null Deviance:	    9879 
#> Residual Deviance: 9875 	AIC: 28260

# intercept defaults to 0 and regression coef for a takes the provided value
trial$simulate(1e4, mean = ~ 1 + a, par = c(a = 0.5)) |> est()
#> 
#> Call:  glm(formula = y ~ a + x, data = data)
#> 
#> Coefficients:
#> (Intercept)            a            x  
#>    0.018315     0.481916     0.002943  
#> 
#> Degrees of Freedom: 9999 Total (i.e. Null);  9997 Residual
#> Null Deviance:	    10560 
#> Residual Deviance: 9977 	AIC: 28360
# trial$simulate(1e4, mean = ~ 1 + a, par = c("(Intercept)" = 0.5)) |> est()

# define mean model that directly works on whole covariate data, incl id and
# num columns
trial$simulate(1e4, mean = \(x) with(x, -1 + a * 2 + x * -3)) |>
  est()
#> 
#> Call:  glm(formula = y ~ a + x, data = data)
#> 
#> Coefficients:
#> (Intercept)            a            x  
#>     -0.9779       1.9655      -2.9914  
#> 
#> Degrees of Freedom: 9999 Total (i.e. Null);  9997 Residual
#> Null Deviance:	    110600 
#> Residual Deviance: 9981 	AIC: 28370

# par argument is not passed on to mean function
trial$simulate(1e4,
  mean = \(x,  reg.par) with(x, reg.par[1] + reg.par[2] * a),
  reg.par = c(1, 5)
) |> est()
#> 
#> Call:  glm(formula = y ~ a + x, data = data)
#> 
#> Coefficients:
#> (Intercept)            a            x  
#>      1.0016       4.9941       0.0163  
#> 
#> Degrees of Freedom: 9999 Total (i.e. Null);  9997 Residual
#> Null Deviance:	    72210 
#> Residual Deviance: 9855 	AIC: 28240

Simulate from continuous outcome model given covariates

Usage

Arguments

Value

See also

Examples