This vignette is intended to provide a first introduction to the R
package dani
, which provides tools to help with the design
and analysis of non-inferiority trials. In particular, the package
provides functions to:
The dani
package offers a set of tools to facilitate
each of these steps for three different types of non-inferiority
trials:
This vignette gives an introductory illustration of the features of
dani
, and the package documentation should be used to find
more detailed information.
Most of the functions in this package are named as A.B.C, where:
A. Refers to the main scope of the function. It can be either convertmargin, compare, samplesize or test to do tasks 1-4 above.
B. Refers to the type of NI trial. It can either be NI (standard two-arm), NIfrontier or ROCI. Note this is not specified in the convertmargin functions, as they are not specific to just one type of NI trial.
C. Refers to the type of primary outcome and can either be binary, survival or continuous.
For example, function samplesize.NI.survival does sample size calculations for a standard two-arm non-inferiority trial with survival outcome, while function test.ROCI.binary analyses a MAMS-ROCI trial with binary outcome.
The convertmargin functions allow one to convert a non-inferiority margin set on a specific population-level summary measure to another one implying same null control and experimental event risks. For example, let’s assume we elicited from experts that the non-inferiority margin for a trial with binary outcome should be a 5 percentage points risk difference but then, for whatever reason, it is established that odds ratios would better represent treatment effects. In order to obtain the non-inferiority margin on the odds ratio scale that matches the one we obtained as a risk difference, we need to run the following code:
library(dani)
convertmargin.binary(p.control.expected = 0.05,
NI.margin.original = 0.05,
summary.measure.original = "RD",
summary.measure.target = "OR")
# [1] 2.111111
So, for an expected control event risk of 5/, the corresponding non-inferiority margin on the odds ratio scale is 2.11. The function for binary data supports four different summary measures: (absolute) Risk Difference (“RD”), Risk Ratio (RR”), Odds Ratio (“OR”) and ArcSine difference (“AS”). The function for continuous data is quite similar, and only supports mean difference (“difference”) and mean ratio (“ratio”) for the time being:
convertmargin.continuous(mean.control.expected = 2,
NI.margin.original = 1,
summary.measure.original = "difference",
summary.measure.target = "ratio")
# [1] 1.5
The above code indicates that the equivalent of a non-inferiority margin of 1 as a difference of means corresponds to a margin of 1.5 for a ratio of means, if the control mean is 2. Finally, the function for converting a margin with a time-to-event, or survival, outcome, supports three sumamry measures: Hazard Ratio (“HR”), Difference in restricted Mean Survival Time (“DRMST”) or Difference in Surviving proportion (“DS”). This function requires slightly different input: if using either DRMST or DS, the related horizon times (tau.RMST or t.DS) must be specified. Additionally, the function allows both for provision of the event rate in the control arm (lambda) or the control risk (p.control.expected) at a certain time (t.expected). Of note, this function assumes event times follow an exponential distribution with fixed rate lambda for the time being.
convertmargin.survival(rate.control.expected = 0.2,
NI.margin.original = 1.2,
summary.measure.original = "HR",
summary.measure.target = "DRMST",
tau.RMST = 3)
# [1] -0.1174096
In non-inferiority trials, the summary measure that we choose to
quantify the treatment effect has implications for the power of the
study. One way to visualise this is to plot the non-inferiority frontier
(Quartagno et al, 2020) associated with a certain summary measure: this
is a plot showing the non tolerable event risk (for a binary outcome) in
the experimental arm for each possible value of control event risk when
using a specific margin. The compare functions do exactly this and, on
top of plotting the various frontiers with a specific set of design
parameters, additionally calculate which summary measure is guaranteed
to have more power.
The summary measures supported are the same as for the convertmargin
functions. For example, for a binary outcome, if the expectation is that
the event risk is going to be 5/ and
that the margin in such case should be a 5 percentage points risk
difference, the various frontier could be obtained as follows:
compare.NIfrontier.binary(p.control.expected = 0.05,
p.experim.target = 0.05,
p.range=c(0.01,0.15),
NI.margin=0.05,
summary.measure="RD")
# ....................................................................................................
# Risk difference Margin = 0.05.
# Risk ratio Margin = 2.
# Odds ratio Margin = 2.111111.
# Arcsine Difference Margin = 0.09623715
# Expected risk in control arm: 0.05
# The Risk Difference summary measure for testing non-inferiority is the most powerful. The Arcsine Difference summary measure comes second followed by the Odds Ratio summary measure. The Risk Ratio is the least powerful summary measure.
#
# The non-inferiority frontier plot shows the distance of various frontiers from the expected point (solid circle).
#
# Green line: Risk Difference frontier.
# Blue line: Odds Ratio frontier.
# Black line: Risk Ratio frontier.
# Orange line: Arcsine difference frontier.
# The black cross represents the frontier point, i.e. the point that defines the point null at the expected control risk.
# The dashed line represents the line of equality, i.e. the line where control and experimental risks are the same.
An important part of designing a clinical trial, is calculating its required sample size. Having chosen all design parameters, including what summary measure to use to quantify treatment effect, we can do proper calculations using the samplesize functions. There are functions for all types of outcomes and trial type. Input required for functions for standard 2-arm trials include:
Example code and output for a sample size calculation using the function for binary outcomes is shown below:
samplesize.NI.binary(p.control.expected = 0.1,
p.experim.target = 0.1,
NI.margin = 0.05,
test.type = "Wald")
# Method: Wald
# Power: 90 %
# One-sided significance level: 2.5 %.
# Expected control event risk = 10 %
# Expected experimental arm event risk (alternative H) = 10 %
# Non-acceptable experimental arm event risk (null H) = 15 %
# Expected loss to follow-up: 0 %
# The sample size required to test non-inferiority within a 5 % risk difference NI margin is:
# 757 individuals in the control group.
# 757 individuals in the experimental treatment group.
# [1] 757 757
The functions for sample size calculations for NI frontier and ROCI designs are similar but require further input. For NI frontier, the user is requested to pass the frontier as a function. For ROCI, there are several additional pramaters:
An example code and output is shown below:
samplesize.ROCI.binary(p.expected.curve = rep(0.05,7),
NI.margin = rep(0.05,6),
reference = 20,
se.method = "delta",
treatment.levels = c(8,10,12,14,16,18,20),
summary.measure = "RD",
tr.model = "FP2.classic")
# The total sample sizes needed (across all arms) for the specified
# expected curves and NI margins, accounting for 0 % loss
# to follow-up, are:
# Optimal power: 1343
# Range power (conservative estimate): 1343
# Acceptable power (conservative estimate): 38 .
# $ss.total
# [1] 1343 932 597 336 150 38
#
# $ss.total.optimal
# [1] 1343
#
# $ss.total.range
# [1] 1343
#
# $ss.total.acceptable
# [1] 38
#
# $res
# NULL
Finally, the package provides functions to test for non-inferiority.
While many of the possible tests are available in other packages, the
goal of the test functions in dani
is to provde a wrapper
that allows one to choose their preferred test type and summary measure.
Also, it focuses on non-inferiority questions, providing related
p-values, rather than ones related to superiority-type questions.
Outcomes are returned in different ways depending on their type. Binary
outcomes are returned as number of events (e.control and e.experim) and
patients (n.control and n.experim) in both arms. Continuous outcomes are
returned as vectors of observations in control (y.control) and
experimental (y.experim) arms. Survival outcomes are returned through
three vectors, one for the observed time (time), one for the event
indicator (event) and one for treatment indicator (treat). Most other
inputs are passed similarly to the samplesize functions. These include
summary.measure (same ones are supported), test.type and significance
level. There are currently over 30 tests supported for binary data, 11
for survival and 9 for continuous. Help files provide further details.
Not all test types estimate a p-value directly. When this is not the
case, the p-value can be estimated recursively (option recursive.p.estim
= TRUE), by testing at different significance levels. If
recursive.p.estim=FALSE, the p-value for methods where it is not
directly estimated, is estimated based on normal approximations. This is
an example code and output for testing for non-inferiority with binary
data:
test.NI.binary(n.control = 100,
n.experim = 100,
e.control = 10,
e.experim = 10,
NI.margin = 0.1,
test.type="Newcombe10")
# Testing for non-inferiority.
# Summary measure: Risk difference.
# Non-inferiority margin = 0.1.
# Method: Newcombe10.
# Estimate = 0
# Confidence interval (Two-sided 95%): (-0.08680254,0.08680254)
# p-value = 0.01197418.
# The confidence interval does not cross the null ( RD = 0.1 ), and hence we have evidence of non-inferiority.
# Note: with the test = Newcombe10 for summary measure = RD , p-value and standard error are only approximations based on a Z test with given logarithm of estimate and CI.
# $estimate
# [1] 0
#
# $se
# [1] 0.04428783
#
# $p
# [1] 0.01197418
#
# $CI
# [1] -0.08680254 0.08680254
#
# $test.type
# [1] "Newcombe10"
#
# $summary.measure
# [1] "RD"
#
# $is.p.est
# [1] TRUE
#
# $sig.level
# [1] 0.025
#
# $non.inferiority
# [1] TRUE
Once again, the functions for MAMS-ROCI trials require slightly different inputs and provide slightly different output. Aside from differences we already highlighted for samplesize functions, the main one is the way the outcomes are provided. There are two possible ways: either by providing a vector of outcomes (outcomes, numeric) and one of treatment indicators (treatment, numeric). Or through the usual data+formual interface, where the formula should indicate what is the treatment variable as follows:
duration.arms=c(8,10,12,14,16,18,20)
sam.sizes=c(700)
NI.margin.RD<-0.1
durations<-rep(duration.arms, each=100)
y<-rbinom(sam.sizes,1,0.05+(20-durations)*0.01)
data.ex<-data.frame(y,durations)
myformula<-as.formula(y~treat(durations))
res1<-test.ROCI.binary(formula=myformula, data=data.ex,
se.method="delta", treatment.levels=8:20,
summary.measure="RD", NI.margin=NI.margin.RD)
Results can be summarised and plotted with standard sumamry and plot functions as follows:
# Family = binomial
# Model fit:
#
# Call:
# glm(formula = outcomes ~ I((treatment/10)^1) + I((treatment/10)^1 *
# log((treatment/10))), family = "binomial", data = data.mfp)
#
# Coefficients:
# Estimate Std. Error z value Pr(>|z|)
# (Intercept) -4.514 3.197 -1.412 0.158
# I((treatment/10)^1) 2.793 3.216 0.869 0.385
# I((treatment/10)^1 * log((treatment/10))) -3.033 2.523 -1.202 0.229
#
# (Dispersion parameter for binomial family taken to be 1)
#
# Null deviance: 476.70 on 699 degrees of freedom
# Residual deviance: 463.74 on 697 degrees of freedom
# AIC: 469.74
#
# Number of Fisher Scoring iterations: 5
#
# Difference from control treatment level ( 20 ):
# 19 - 20 : 0.01 ( 0.005 , 0.016 )
# 18 - 20 : 0.021 ( 0.009 , 0.034 )
# 17 - 20 : 0.034 ( 0.012 , 0.056 )
# 16 - 20 : 0.047 ( 0.016 , 0.078 )
# 15 - 20 : 0.061 ( 0.02 , 0.102 )
# 14 - 20 : 0.074 ( 0.025 , 0.123 )
# 13 - 20 : 0.086 ( 0.031 , 0.142 )
# 12 - 20 : 0.097 ( 0.039 , 0.155 )
# 11 - 20 : 0.105 ( 0.047 , 0.163 )
# 10 - 20 : 0.11 ( 0.054 , 0.166 )
# 9 - 20 : 0.111 ( 0.055 , 0.167 )
# 8 - 20 : 0.108 ( 0.042 , 0.173 )
# Recommended treatment level with selected NI margin: 16 .
Ghorani, E., Quartagno, M., Blackhall, F., et al. REFINE-Lung implements a novel multi-arm randomised trial design to address possible immunotherapy overtreatment, The Lancet Oncology, 24(5), 2023, https://doi.org/10.1016/S1470-2045(23)00095-5.
Quartagno, M., Chan, M., Turkova, A. et al. The Smooth Away From Expected (SAFE) non-inferiority frontier: theory and implementation with an application to the D3 trial. Trials 24, 556 (2023). https://doi.org/10.1186/s13063-023-07586-5
Quartagno, M., Walker, A.S., Babiker, A.G. et al. Handling an uncertain control group event risk in non-inferiority trials: non-inferiority frontiers and the power-stabilising transformation. Trials 21, 145 (2020). https://doi.org/10.1186/s13063-020-4070-4 —