Skip to contents

The procedure uses plug-in-estimation and influence functions to perform robust inference of any specified estimand in the setting of a randomised clinical trial, even in the case of heterogeneous effect of covariates in randomisation groups. See Powering RCTs for marginal effects with GLMs using prognostic score adjustment by Højbjerre-Frandsen et. al (2025) for more details on methodology.

Usage

rctglm(
  formula,
  exposure_indicator,
  exposure_prob,
  data,
  family = gaussian,
  estimand_fun = "ate",
  estimand_fun_deriv0 = NULL,
  estimand_fun_deriv1 = NULL,
  cv_variance = FALSE,
  cv_variance_folds = 10,
  verbose = options::opt("verbose"),
  ...
)

Arguments

formula

an object of class "formula" (or one that can be coerced to that class): a symbolic description of the model to be fitted. The details of model specification are given under ‘Details’ in the glm documentation.

exposure_indicator

(name of) the binary variable in data that identifies randomisation groups. The variable is required to be binary to make the "orientation" of the estimand_fun clear.

exposure_prob

a numeric with the probability of being in "group 1" (rather than group 0) in groups defined by exposure_indicator.

data

an optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which the function is called.

family

a description of the error distribution and link function to be used in the model. For glm this can be a character string naming a family function, a family function or the result of a call to a family function. For glm.fit only the third option is supported. (See family for details of family functions.)

estimand_fun

a function with arguments psi1 and psi0 specifying the estimand. Alternative, specify "ate" or "rate_ratio" as a character to use one of the default estimand functions. See more details in the "Estimand" section of rctglm.

estimand_fun_deriv0

a function specifying the derivative of estimand_fun wrt. psi0. As a default the algorithm will use symbolic differentiation to automatically find the derivative from estimand_fun

estimand_fun_deriv1

a function specifying the derivative of estimand_fun wrt. psi1. As a default the algorithm will use symbolic differentiation to automatically find the derivative from estimand_fun

cv_variance

a logical determining whether to estimate the variance using cross-validation (see details of rctglm).

cv_variance_folds

a numeric with the number of folds to use for cross validation if cv_variance is TRUE.

verbose

numeric verbosity level. Higher values means more information is printed in console. A value of 0 means nothing is printed to console during execution (Defaults to 2, overwritable using option 'postcard.verbose' or environment variable 'R_POSTCARD_VERBOSE')

...

Additional arguments passed to stats::glm()

Value

rctglm returns an object of class inheriting from "rctglm".

An object of class rctglm is a list containing the following components:

  • estimand: A data.frame with plug-in estimate of estimand, standard error (SE) estimate and variance estimate of estimand

  • estimand_funs: A list with

    • f: The estimand_fun used to obtain an estimate of the estimand from counterfactual means

    • d0: The derivative with respect to psi0

    • d1: The derivative with respect to psi1

  • means_counterfactual: A data.frame with counterfactual means psi0 and psi1

  • fitted.values_counterfactual: A data.frame with counterfactual mean values, obtained by transforming the linear predictors for each group by the inverse of the link function.

  • glm: A glm object returned from running stats::glm within the procedure

  • call: The matched call

Details

The procedure assumes the setup of a randomised clinical trial with observations grouped by a binary exposure_indicator variable, allocated randomly with probability exposure_prob. A GLM is fit and then used to predict the response of all observations in the event that the exposure_indicator is 0 and 1, respectively. Taking means of these predictions produce the counterfactual means psi0 and psi1, and an estimand r(psi0, psi1) is calculated using any specified estimand_fun.

The variance of the estimand is found by taking the variance of the influence function of the estimand. If cv_variance is TRUE, then the counterfactual predictions for each observation (which are used to calculate the value of the influence function) is obtained as out-of-sample (OOS) predictions using cross validation with number of folds specified by cv_variance_folds. The cross validation splits are performed using stratified sampling with exposure_indicator as the strata argument in rsample::vfold_cv.

Read more in vignette("model-fit").

Estimands

As noted in the description, psi0 and psi1 are the counterfactual means found by prediction using a fitted GLM in the binary groups defined by exposure_indicator.

Default estimand functions can be specified via "ate" (which uses the function function(psi1, psi0) psi1-psi0) and "rate_ratio" (which uses the function function(psi1, psi0) psi1/psi0). See more information on specifying the estimand_fun in vignette("model-fit").

As a default, the Deriv package is used to perform symbolic differentiation to find the derivatives of the estimand_fun.

See also

See how to extract information using methods in rctglm_methods.

Use rctglm_with_prognosticscore() to include prognostic covariate adjustment.

See vignettes

Examples

# Generate some data to showcase example
n <- 100
exp_prob <- .5

dat_gaus <- glm_data(
  Y ~ 1+1.5*X1+2*A,
  X1 = rnorm(n),
  A = rbinom(n, 1, exp_prob),
  family = gaussian()
)

# Fit the model
ate <- rctglm(formula = Y ~ .,
              exposure_indicator = A,
              exposure_prob = exp_prob,
              data = dat_gaus,
              family = gaussian)
#> 
#> ── Symbolic differentiation of estimand function ──
#> 
#>  Symbolically deriving partial derivative of the function 'psi1 - psi0' with respect to 'psi0' as: '-1'.
#> • Alternatively, specify the derivative through the argument
#> `estimand_fun_deriv0`
#>  Symbolically deriving partial derivative of the function 'psi1 - psi0' with respect to 'psi1' as: '1'.
#> • Alternatively, specify the derivative through the argument
#> `estimand_fun_deriv1`

# Pull information on estimand
estimand(ate)
#>   Estimate Std. Error
#> 1 1.990445   0.197611

## Another example with different family and specification of estimand_fun
dat_binom <- glm_data(
  Y ~ 1+1.5*X1+2*A,
  X1 = rnorm(n),
  A = rbinom(n, 1, exp_prob),
  family = binomial()
)

rr <- rctglm(formula = Y ~ .,
              exposure_indicator = A,
              exposure_prob = exp_prob,
              data = dat_binom,
              family = binomial(),
              estimand_fun = "rate_ratio")
#> 
#> ── Symbolic differentiation of estimand function ──
#> 
#>  Symbolically deriving partial derivative of the function 'psi1/psi0' with respect to 'psi0' as: '-(psi1/psi0^2)'.
#> • Alternatively, specify the derivative through the argument
#> `estimand_fun_deriv0`
#>  Symbolically deriving partial derivative of the function 'psi1/psi0' with respect to 'psi1' as: '1/psi0'.
#> • Alternatively, specify the derivative through the argument
#> `estimand_fun_deriv1`

odds_ratio <- function(psi1, psi0) (psi1*(1-psi0))/(psi0*(1-psi1))
or <- rctglm(formula = Y ~ .,
              exposure_indicator = A,
              exposure_prob = exp_prob,
              data = dat_binom,
              family = binomial,
              estimand_fun = odds_ratio)
#> 
#> ── Symbolic differentiation of estimand function ──
#> 
#>  Symbolically deriving partial derivative of the function '(psi1 * (1 - psi0))/(psi0 * (1 - psi1))' with respect to 'psi0' as: '{     .e1 <- 1 - psi1     .e2 <- psi0 * .e1     -(psi1 * ((1 - psi0) * .e1/.e2^2 + 1/.e2)) }'.
#> • Alternatively, specify the derivative through the argument
#> `estimand_fun_deriv0`
#>  Symbolically deriving partial derivative of the function '(psi1 * (1 - psi0))/(psi0 * (1 - psi1))' with respect to 'psi1' as: '{     .e2 <- psi0 * (1 - psi1)     (1 - psi0) * (1/.e2 + psi0 * psi1/.e2^2) }'.
#> • Alternatively, specify the derivative through the argument
#> `estimand_fun_deriv1`