Skip to contents

Simulate from continuous outcome model with mean $$g(\text{par}^\top X)$$ where \(X\) is the design matrix specified by the formula, and \(g\) is the link function specified by the family argument

Usage

outcome_continuous(
  data,
  mean = NULL,
  par = NULL,
  sd = 1,
  het = 0,
  outcome.name = "y",
  remove = c("id", "num"),
  family = gaussian(),
  ...
)

Arguments

data

(data.table) Covariate data, usually the output of the covariate model of a Trial object.

mean

(formula, function) Either a formula specifying the design from 'data' or a function that maps data to the conditional mean value on the link scale (see examples). If NULL all main-effects of the covariates will be used, except columns that are defined via the remove argument.

par

(numeric) Regression coefficients (default zero). Can be given as a named list corresponding to the column names of model.matrix

sd

(numeric) standard deviation of Gaussian measurement error

het

Introduce variance hetereogeneity by adding a residual term \(het \cdot \mu_x \cdot e\), where \(\mu_x\) is the mean given covariates and \(e\) is an independent standard normal distributed variable. This term is in addition to the measurement error introduced by the sd argument.

outcome.name

Name of outcome variable ("y")

remove

Variables that will be removed from input data (if formula is not specified).

family

exponential family (default gaussian(identity))

...

Additional arguments passed to mean function (see examples)

Value

data.table

Examples

trial <- Trial$new(
  covariates = \(n) data.frame(a = rbinom(n, 1, 0.5), x = rnorm(n)),
  outcome = outcome_continuous
)
est <- function(data) glm(y ~ a + x, data = data)
trial$simulate(1e4, mean = ~ 1 + a + x, par = c(1, 0.5, 2)) |> est()
#> 
#> Call:  glm(formula = y ~ a + x, data = data)
#> 
#> Coefficients:
#> (Intercept)            a            x  
#>      0.9991       0.5112       1.9935  
#> 
#> Degrees of Freedom: 9999 Total (i.e. Null);  9997 Residual
#> Null Deviance:	    50740 
#> Residual Deviance: 9974 	AIC: 28360

# default behavior is to set all regression coefficients to 0
trial$simulate(1e4, mean = ~ 1 + a + x) |> est()
#> 
#> Call:  glm(formula = y ~ a + x, data = data)
#> 
#> Coefficients:
#> (Intercept)            a            x  
#>    0.020756    -0.035260     0.004824  
#> 
#> Degrees of Freedom: 9999 Total (i.e. Null);  9997 Residual
#> Null Deviance:	    9879 
#> Residual Deviance: 9875 	AIC: 28260

# intercept defaults to 0 and regression coef for a takes the provided value
trial$simulate(1e4, mean = ~ 1 + a, par = c(a = 0.5)) |> est()
#> 
#> Call:  glm(formula = y ~ a + x, data = data)
#> 
#> Coefficients:
#> (Intercept)            a            x  
#>    0.018315     0.481916     0.002943  
#> 
#> Degrees of Freedom: 9999 Total (i.e. Null);  9997 Residual
#> Null Deviance:	    10560 
#> Residual Deviance: 9977 	AIC: 28360
# trial$simulate(1e4, mean = ~ 1 + a, par = c("(Intercept)" = 0.5)) |> est()

# define mean model that directly works on whole covariate data, incl id and
# num columns
trial$simulate(1e4, mean = \(x) with(x, -1 + a * 2 + x * -3)) |>
  est()
#> 
#> Call:  glm(formula = y ~ a + x, data = data)
#> 
#> Coefficients:
#> (Intercept)            a            x  
#>     -0.9779       1.9655      -2.9914  
#> 
#> Degrees of Freedom: 9999 Total (i.e. Null);  9997 Residual
#> Null Deviance:	    110600 
#> Residual Deviance: 9981 	AIC: 28370

# par argument is not passed on to mean function
trial$simulate(1e4,
  mean = \(x,  reg.par) with(x, reg.par[1] + reg.par[2] * a),
  reg.par = c(1, 5)
) |> est()
#> 
#> Call:  glm(formula = y ~ a + x, data = data)
#> 
#> Coefficients:
#> (Intercept)            a            x  
#>      1.0016       4.9941       0.0163  
#> 
#> Degrees of Freedom: 9999 Total (i.e. Null);  9997 Residual
#> Null Deviance:	    72210 
#> Residual Deviance: 9855 	AIC: 28240