Skip to contents

Simulate from binary model with probability $$\pi = g(\text{par}^\top X)$$ where \(X\) is the design matrix specified by the formula, and \(g\) is the link function specified by the family argument

Usage

outcome_binary(
  data,
  mean = NULL,
  par = NULL,
  outcome.name = "y",
  remove = c("id", "num"),
  family = binomial(logit),
  ...
)

Arguments

data

(data.table) Covariate data, usually the output of the covariate model of a Trial object.

mean

(formula, function) Either a formula specifying the design from 'data' or a function that maps data to the conditional mean value on the link scale (see examples). If NULL all main-effects of the covariates will be used, except columns that are defined via the remove argument.

par

(numeric) Regression coefficients (default zero). Can be given as a named list corresponding to the column names of model.matrix

outcome.name

Name of outcome variable ("y")

remove

Variables that will be removed from input data (if formula is not specified).

family

exponential family (default binomial(logit))

...

Additional arguments passed to mean function (see examples)

Value

data.table

Examples

trial <- Trial$new(
  covariates = \(n) data.frame(a = rbinom(n, 1, 0.5)),
  outcome = outcome_binary
)
est <- function(data) glm(y ~ a, data = data, family = binomial(logit))
trial$simulate(1e4, mean = ~ 1 + a, par = c(1, 0.5)) |> est()
#> 
#> Call:  glm(formula = y ~ a, family = binomial(logit), data = data)
#> 
#> Coefficients:
#> (Intercept)            a  
#>       1.002        0.549  
#> 
#> Degrees of Freedom: 9999 Total (i.e. Null);  9998 Residual
#> Null Deviance:	    10580 
#> Residual Deviance: 10450 	AIC: 10460

# default behavior is to set all regression coefficients to 0
trial$simulate(1e4, mean = ~ 1 + a) |> est()
#> 
#> Call:  glm(formula = y ~ a, family = binomial(logit), data = data)
#> 
#> Coefficients:
#> (Intercept)            a  
#>     0.02405     -0.00329  
#> 
#> Degrees of Freedom: 9999 Total (i.e. Null);  9998 Residual
#> Null Deviance:	    13860 
#> Residual Deviance: 13860 	AIC: 13870

# intercept defaults to 0 and regression coef for a takes the provided value
trial$simulate(1e4, mean = ~ 1 + a, par = c(a = 0.5)) |> est()
#> 
#> Call:  glm(formula = y ~ a, family = binomial(logit), data = data)
#> 
#> Coefficients:
#> (Intercept)            a  
#>    -0.01192      0.52361  
#> 
#> Degrees of Freedom: 9999 Total (i.e. Null);  9998 Residual
#> Null Deviance:	    13720 
#> Residual Deviance: 13550 	AIC: 13550
# trial$simulate(1e4, mean = ~ 1 + a, par = c("(Intercept)" = 1))

# define mean model that directly works on whole covariate data, incl id and
# num columns
trial$simulate(1e4, mean = \(x) with(x, lava::expit(1 + 0.5 * a))) |>
  est()
#> 
#> Call:  glm(formula = y ~ a, family = binomial(logit), data = data)
#> 
#> Coefficients:
#> (Intercept)            a  
#>      0.9921       0.4917  
#> 
#> Degrees of Freedom: 9999 Total (i.e. Null);  9998 Residual
#> Null Deviance:	    10740 
#> Residual Deviance: 10640 	AIC: 10640

# par argument of outcome_binary is not passed on to mean function
trial$simulate(1e4,
  mean = \(x,  reg.par) with(x, lava::expit(reg.par[1] + reg.par[2] * a)),
  reg.par = c(1, 0.8)
) |> est()
#> 
#> Call:  glm(formula = y ~ a, family = binomial(logit), data = data)
#> 
#> Coefficients:
#> (Intercept)            a  
#>      1.0232       0.7589  
#> 
#> Degrees of Freedom: 9999 Total (i.e. Null);  9998 Residual
#> Null Deviance:	    10100 
#> Residual Deviance: 9876 	AIC: 9880