Fit GLM and find any estimand (marginal effect) using plug-in estimation with variance estimation using influence functions
Source:R/rctglm.R
rctglm.Rd
The procedure uses plug-in-estimation and influence functions to perform robust inference of any specified estimand in the setting of a randomised clinical trial, even in the case of heterogeneous effect of covariates in randomisation groups. See Powering RCTs for marginal effects with GLMs using prognostic score adjustment by Højbjerre-Frandsen et. al (2025) for more details on methodology.
Usage
rctglm(
formula,
exposure_indicator,
exposure_prob,
data,
family = gaussian,
estimand_fun = "ate",
estimand_fun_deriv0 = NULL,
estimand_fun_deriv1 = NULL,
cv_variance = FALSE,
cv_variance_folds = 10,
verbose = options::opt("verbose"),
...
)
Arguments
- formula
an object of class "formula" (or one that can be coerced to that class): a symbolic description of the model to be fitted. The details of model specification are given under ‘Details’ in the glm documentation.
- exposure_indicator
(name of) the binary variable in
data
that identifies randomisation groups. The variable is required to be binary to make the "orientation" of theestimand_fun
clear.- exposure_prob
a
numeric
with the probability of being in "group 1" (rather than group 0) in groups defined byexposure_indicator
.- data
an optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which the function is called.
- family
a description of the error distribution and link function to be used in the model. For
glm
this can be a character string naming a family function, a family function or the result of a call to a family function. Forglm.fit
only the third option is supported. (Seefamily
for details of family functions.)- estimand_fun
a
function
with argumentspsi1
andpsi0
specifying the estimand. Alternative, specify "ate" or "rate_ratio" as acharacter
to use one of the default estimand functions. See more details in the "Estimand" section of rctglm.- estimand_fun_deriv0
a
function
specifying the derivative ofestimand_fun
wrt.psi0
. As a default the algorithm will use symbolic differentiation to automatically find the derivative fromestimand_fun
- estimand_fun_deriv1
a
function
specifying the derivative ofestimand_fun
wrt.psi1
. As a default the algorithm will use symbolic differentiation to automatically find the derivative fromestimand_fun
- cv_variance
a
logical
determining whether to estimate the variance using cross-validation (see details of rctglm).- cv_variance_folds
a
numeric
with the number of folds to use for cross validation ifcv_variance
isTRUE
.- verbose
numeric
verbosity level. Higher values means more information is printed in console. A value of 0 means nothing is printed to console during execution (Defaults to2
, overwritable using option 'postcard.verbose' or environment variable 'R_POSTCARD_VERBOSE')- ...
Additional arguments passed to
stats::glm()
Value
rctglm
returns an object of class inheriting from "rctglm"
.
An object of class rctglm
is a list containing the following components:
estimand
: Adata.frame
with plug-in estimate of estimand, standard error (SE) estimate and variance estimate of estimandestimand_funs
: Alist
withf
: Theestimand_fun
used to obtain an estimate of the estimand from counterfactual meansd0
: The derivative with respect topsi0
d1
: The derivative with respect topsi1
means_counterfactual
: Adata.frame
with counterfactual meanspsi0
andpsi1
fitted.values_counterfactual
: Adata.frame
with counterfactual mean values, obtained by transforming the linear predictors for each group by the inverse of the link function.glm
: Aglm
object returned from running stats::glm within the procedurecall
: The matchedcall
Details
The procedure assumes the setup of a randomised clinical trial with observations grouped by a binary
exposure_indicator
variable, allocated randomly with probability exposure_prob
. A GLM is
fit and then used to predict the response of all observations in the event that the exposure_indicator
is 0 and 1, respectively. Taking means of these predictions produce the counterfactual means
psi0
and psi1
, and an estimand r(psi0, psi1)
is calculated using any specified estimand_fun
.
The variance of the estimand is found by taking the variance of the influence function of the estimand.
If cv_variance
is TRUE
, then the counterfactual predictions for each observation (which are
used to calculate the value of the influence function) is obtained as out-of-sample (OOS) predictions
using cross validation with number of folds specified by cv_variance_folds
. The cross validation splits
are performed using stratified sampling with exposure_indicator
as the strata
argument in rsample::vfold_cv.
Read more in vignette("model-fit")
.
Estimands
As noted in the description, psi0
and psi1
are the counterfactual means found by prediction using
a fitted GLM in the binary groups defined by exposure_indicator
.
Default estimand functions can be specified via "ate"
(which uses the function
function(psi1, psi0) psi1-psi0
) and "rate_ratio"
(which uses the function
function(psi1, psi0) psi1/psi0
). See more information on specifying the estimand_fun
in vignette("model-fit")
.
As a default, the Deriv
package is used to perform symbolic differentiation to find the derivatives of
the estimand_fun
.
See also
See how to extract information using methods in rctglm_methods.
Use rctglm_with_prognosticscore()
to include prognostic covariate adjustment.
See vignettes
Examples
# Generate some data to showcase example
n <- 100
exp_prob <- .5
dat_gaus <- glm_data(
Y ~ 1+1.5*X1+2*A,
X1 = rnorm(n),
A = rbinom(n, 1, exp_prob),
family = gaussian()
)
# Fit the model
ate <- rctglm(formula = Y ~ .,
exposure_indicator = A,
exposure_prob = exp_prob,
data = dat_gaus,
family = gaussian)
#>
#> ── Symbolic differentiation of estimand function ──
#>
#> ℹ Symbolically deriving partial derivative of the function 'psi1 - psi0' with respect to 'psi0' as: '-1'.
#> • Alternatively, specify the derivative through the argument
#> `estimand_fun_deriv0`
#> ℹ Symbolically deriving partial derivative of the function 'psi1 - psi0' with respect to 'psi1' as: '1'.
#> • Alternatively, specify the derivative through the argument
#> `estimand_fun_deriv1`
# Pull information on estimand
estimand(ate)
#> Estimate Std. Error
#> 1 1.990445 0.197611
## Another example with different family and specification of estimand_fun
dat_binom <- glm_data(
Y ~ 1+1.5*X1+2*A,
X1 = rnorm(n),
A = rbinom(n, 1, exp_prob),
family = binomial()
)
rr <- rctglm(formula = Y ~ .,
exposure_indicator = A,
exposure_prob = exp_prob,
data = dat_binom,
family = binomial(),
estimand_fun = "rate_ratio")
#>
#> ── Symbolic differentiation of estimand function ──
#>
#> ℹ Symbolically deriving partial derivative of the function 'psi1/psi0' with respect to 'psi0' as: '-(psi1/psi0^2)'.
#> • Alternatively, specify the derivative through the argument
#> `estimand_fun_deriv0`
#> ℹ Symbolically deriving partial derivative of the function 'psi1/psi0' with respect to 'psi1' as: '1/psi0'.
#> • Alternatively, specify the derivative through the argument
#> `estimand_fun_deriv1`
odds_ratio <- function(psi1, psi0) (psi1*(1-psi0))/(psi0*(1-psi1))
or <- rctglm(formula = Y ~ .,
exposure_indicator = A,
exposure_prob = exp_prob,
data = dat_binom,
family = binomial,
estimand_fun = odds_ratio)
#>
#> ── Symbolic differentiation of estimand function ──
#>
#> ℹ Symbolically deriving partial derivative of the function '(psi1 * (1 - psi0))/(psi0 * (1 - psi1))' with respect to 'psi0' as: '{ .e1 <- 1 - psi1 .e2 <- psi0 * .e1 -(psi1 * ((1 - psi0) * .e1/.e2^2 + 1/.e2)) }'.
#> • Alternatively, specify the derivative through the argument
#> `estimand_fun_deriv0`
#> ℹ Symbolically deriving partial derivative of the function '(psi1 * (1 - psi0))/(psi0 * (1 - psi1))' with respect to 'psi1' as: '{ .e2 <- psi0 * (1 - psi1) (1 - psi0) * (1/.e2 + psi0 * psi1/.e2^2) }'.
#> • Alternatively, specify the derivative through the argument
#> `estimand_fun_deriv1`