Why not share! Embed Size px. Start on.
Adaptive Design Methods in Clinical Trials : Shein-Chung Chow :
Show related SlideShares at end. WordPress Shortcode. Published in: Education. Full Name Comment goes here. Are you sure you want to Yes No. Be the first to like this. No Downloads. Views Total views. Actions Shares. Embeds 0 No embeds. No notes for slide. Book Details Author : Karl E. Description Please continue to the next page Emphasizes the importance of statistical thinking in clinical research and presents the methodology as a key component of clinical research. From ethical issues and sample size considerations to adaptive design procedures and statistical analysis, the book first covers the methodology that spans various clinical trials.
Estimates will be unbiased , meaning that if the study were to be repeated many times according to the same protocol, the average estimate would be equal to the true treatment effect. These are by no means the only relevant criteria for assessing the performance of a trial design. Other metrics include the accuracy of estimation e. ADs usually perform considerably better than non-ADs in terms of these other criteria, which are also of more direct interest to patients.
The analysis of an AD trial often involves combining data from different stages, which can be done e. It is still possible to compute the estimated treatment effect, its CI and a p value. If these quantities are, however, naively computed using the same methods as in a fixed-design trial, then they often lack the desirable properties mentioned above, depending on the nature of adaptations employed [ 72 ].
This is because the statistical distribution of the estimated treatment effect can be affected, sometimes strongly, by an AD [ 73 ]. The CI and p value usually depend on the treatment effect estimate and are, thus, also affected. As an example, consider a two-stage adaptive RCT that can stop early if the experimental treatment is doing poorly against the control at an interim analysis, based on a pre-specified stopping rule applied to data from patients assessed during the first stage.
If the trial is not stopped early, the final estimated treatment effect calculated from all first- and second-stage patient data will be biased upwards. This is because the trial will stop early for futility at the first stage whenever the experimental treatment is—simply by chance—performing worse than average, and no additional second-stage data will be collected that could counterbalance this effect via regression to the mean. The bottom line is that random lows are eliminated by the stopping rule but random highs are not, thus, biasing the treatment effect estimate upwards.
Adaptive Design Methods in Clinical Trials
This phenomenon occurs for a wide variety of ADs, especially when first-stage efficacy data are used to make adaptations such as discontinuing arms. Therefore, we provide several solutions that lead to sensible treatment effects estimates, CIs and p values from AD trials. Illustration of bias introduced by early stopping for futility. This is for 20 simulated two-arm trials with no true treatment effect.
The trajectories of the test statistics as a standardised measure of the difference between treatments are subject to random fluctuation. Two trials red are stopped early because their test statistics are below a pre-defined futility boundary blue cross at the interim analysis. Allowing trials with random highs at the interim to continue but terminating trials with random lows early will lead to an upward bias of the average treatment effect. When stopping rules for an AD are clearly specified as they should be , a variety of techniques are available to improve the estimation of treatment effects over naive estimators, especially for group-sequential designs.
One approach is to derive an unbiased estimator [ 74 — 77 ]. Though unbiased, they will generally have a larger variance and thus, be less precise than other estimators. A second approach is to use an estimator that reduces the bias compared to the methods used for fixed-design trials, but does not necessarily completely eliminate it.
Examples of this are the bias-corrected maximum likelihood estimator [ 78 ] and the median unbiased estimator [ 79 ]. Another alternative is to use shrinkage approaches for trials with multiple treatment arms [ 36 , 80 , 81 ]. In general, such estimators substantially reduce the bias compared to the naive estimator. Although they are not usually statistically unbiased, they have lower variance than the unbiased estimators [ 74 , 82 ].
In trials with time-to-event outcomes, a follow-up to the planned end of the trial can markedly reduce the bias in treatment arms discontinued at interim [ 83 ]. An improved estimator of the treatment effect is not yet available for all ADs. In such cases, one may empirically adjust the treatment effect estimator via bootstrapping [ 84 ], i. Simulations can then be used to assess the properties of this bootstrap estimator. The disadvantage of bootstrapping is that it may require a lot of computing power, especially for more complex ADs. For some ADs, there are CIs that have the correct coverage level taking into account the design used [ 18 , 19 , 85 , 86 ], including simple repeated CIs [ 87 ].
If a particular AD does not have a method that can be readily applied, then it is advisable to carry out simulations at the design stage to see whether the coverage of the naively found CIs deviates considerably from the planned level. In that case, a bootstrap procedure could be applied for a wide range of designs if this is not too computationally demanding.
A p value is often presented alongside the treatment effect estimate and CI as it helps to summarise the level of evidence against the null hypothesis. In a fixed-design trial, this is simply the magnitude of the test statistic.
- Buffettology: Warren Buffetts Investing Techniques.
- Bayesian Adaptive Methods for Clinical Trials (Chapman & Hall CRC Biostatistics Series) - PDF Drive.
- Through Hell and the Thousand-Year Reich.
However, in an AD that allows early stopping for futility or efficacy, it is necessary to distinguish between different ways in which the null hypothesis might be rejected [ 73 ]. There are several different ways that data from an AD may be ordered, and the p value found and also the CI may depend on which method is used. Thus, it is essential to pre-specify which method will be used and to provide some consideration of the sensitivity of the results to the method. The total probability of rejecting the null hypothesis type I error rate is an important quantity in clinical trials, especially for phase III trials where a type I error may mean an ineffective or harmful treatment will be used in practice.
In some ADs, a single null hypothesis is tested but the actual type I error rate is different from the planned level specified before the trial, unless a correction is performed. As an example, if unblinded data with knowledge or use of treatment allocation such that the interim treatment effect can be inferred are used to adjust the sample size at the interim, then the inflation to the planned type I error can be substantial and needs to be accounted for [ 16 , 34 , 35 , 88 ].
On the other hand, blinded sample size re-estimation done without knowledge or use of treatment allocation usually has a negligible impact on the type I error rate and inference when performed with a relatively large sample size, but inflation can still occur [ 89 , 90 ]. In some ADs, multiple hypotheses are tested e. In any AD or non-AD trial, the more often the null hypotheses are tested, the higher the chance that one will be incorrectly rejected. This can sometimes be done with relatively simple methods [ 95 ]; however, it may not be possible for all multiple testing procedures to derive corresponding useful CIs.
In a MAMS setting, adjustment is viewed as being particularly important when the trial is confirmatory and when the research arms are different doses or regimens of the same treatment, whereas in some other cases, it might not be considered essential, e. When making a decision about whether to adjust for multiplicity, it may help to think what adjustment would have been required had the results of the equivalent trials been conducted as separate two-arm trials.
Regulatory guidance is commonly interpreted as encouraging strict adjustment for multiple testing within a single trial [ 97 — 99 ]. While this paper focuses on frequentist classical statistical methods for trial design and analysis, there is also a wealth of Bayesian AD methods [ ] that are increasingly being applied in clinical research [ 23 ]. Bayesian statistics and adaptivity go very well together [ 4 ]. For instance, taking multiple looks at the data is statistically unproblematic as it does not have to be adjusted for separately in a Bayesian framework. Although Bayesian statistics is by nature not concerned with type I error rate control or p values, it is common to evaluate and report the frequentist operating characteristics of Bayesian designs, such as power and type I error rate [ — ].
Consider e. Moreover, there are some hybrid AD methods that blend frequentist and Bayesian aspects [ — ]. Besides these statistical issues, the interpretability of results may also be affected by the way triallists conduct an AD trial, in particular with respect to mid-trial data analyses. Using interim data to modify study aspects may raise anxiety in some research stakeholders due to the potential introduction of operational bias.
Knowledge, leakage or mere speculation of interim results could alter the behaviour of those involved in the trial, including investigators, patients and the scientific community [ , ]. Hence, it is vital to describe the processes and procedures put in place to minimise potential operational bias. Triallists, as well as consumers of trial reports, should give consideration to:. The importance of confidentiality and models for monitoring AD trials have been discussed [ 46 , ].
Inconsistencies in the conduct of the trial across different stages e. As an example, modifications of eligibility criteria might lead to a shift in the patient population over time, and results may depend on whether patients were recruited before or after the interim analysis.
Consequently, the ability to combine results across independent interim stages to assess the overall treatment effect becomes questionable. Heterogeneity between the stages of an AD trial could also arise when the trial begins recruiting from a limited number of sites in a limited number of countries , which may not be representative of all the sites that will be used once recruitment is up and running [ 55 ].
Difficulties faced in interpreting research findings with heterogeneity across interim stages have been discussed in detail [ — ]. Although it is hard to distinguish heterogeneity due to change from that influenced by operational bias, we believe there is a need to explore stage-wise heterogeneity by presenting key patient characteristics and results by independent stages and treatment groups. High-quality reporting of results is a vital part of running any successful trial [ ]. The reported findings need to be credible, transparent and repeatable.
Where there are potential biases, the report should highlight them, and it should also comment on how sensitive the results are to the assumptions made in the statistical analysis. Much effort has been made to improve the reporting quality of traditional clinical trials. Recent work has discussed the reporting of AD trials with examples of and recommendations for minimum standards [ — ] and identified several items in the CONSORT check list as relevant when reporting an AD trial [ , ]. Mindful of the statistical and operational pitfalls discussed in the previous section, we have compiled a list of 11 reporting items that we consider essential for AD trials, along with some explanations and examples.
Given the limited word counts of most medical journals, we acknowledge that a full description of all these items may need to be included as supplementary material. However, sufficient information must be provided in the main body, with references to additional material. This will enable readers and reviewers to gauge the appropriateness of the design and interpret its findings correctly. Research objectives and hypotheses should be set out in detail, along with how the chosen AD suits them.
Reasons for using more established ADs have been discussed in the literature, e. The choice of routinely used ADs, such as CRM for dose escalation or group-sequential designs, should be self-evident and need not be justified every time. A trial report should not only state the type of AD used but also describe its scope adequately. This allows the appropriateness of the statistical methods used to be assessed and the trial to be replicated. The scope relates to what the adaptation s encompass, such as terminating futile treatment arms or selecting the best performing treatment in a MAMS design.
The scope of ADs with varying objectives is broad and can sometimes include multiple adaptations aimed at addressing multiple objectives in a single trial. In addition to reporting the overall planned and actually recruited sample sizes as in any RCT, AD trial reports should provide information on the timing of interim analyses e.
Transparency with respect to adaptation procedures is crucial [ ]. Hence, reports should include the decision rules used, their justification and timing as well as the frequency of interim analyses. It is important for the research team, including the clinical and statistical researchers, to discuss adaptation criteria at the planning stage and to consider the validity and clinical interpretation of the results. Some ADs, however, may require simulation work under a number of scenarios to:. It is important to provide clear simulation objectives, a rationale for the scenarios investigated and evidence showing that the desired statistical properties have been preserved.
The simulation protocol and report, as well as any software code used to generate the results, should be made accessible. In addition, traditional naive estimates could be reported alongside adjusted estimates. Whenever data from different stages are combined in the analysis, it is important to disclose the combination method used as well as the rationale behind it. Reporting the following, if appropriate for the design used, could provide some form of assurance to the scientific research community:.
Nonetheless, differentiating between randomly occurring and design-induced heterogeneity or population drift is tough, and even standard fixed designs are not immune to this problem. Prospective planning of an AD is important for credibility and regulatory considerations [ 41 ]. However, as in any other non-AD trial, some events not envisaged during the course of the trial may call for changes to the design that are outside the scope of a priori planned adaptations, or there may be a failure to implement planned adaptations.
Questions may be raised regarding the implications of such unplanned ad hoc modifications. Is the planned statistical framework still valid? Were the changes driven by potential bias? Are the results still interpretable in relation to the original research question? Thus, any unplanned modifications must be stated clearly, with an explanation as to why they were implemented and how they may impact the interpretation of trial results. As highlighted earlier, adaptations should be motivated by the need to address specific research objectives.
In the context of the trial conducted and its observed results, triallists should discuss the interpretability of results in relation to the original research question s. In particular, who the study results apply to should be considered. For instance, subgroup selection, enrichment and biomarker ADs are motivated by the need to characterise patients who are most likely to benefit from investigative treatments. Thus, the final results may apply only to patients with specific characteristics and not to the general or enrolled population. What worked well? What went wrong? What could have been done differently?
We encourage the discussion of all positive, negative and perhaps surprising lessons learned over the course of an AD trial. Sharing practical experiences with AD methods will help inform the design, planning and conduct of future trials and is, thus, a key element in ensuring researchers are competent and confident enough to apply ADs in their own trials [ 27 ]. For novel cutting-edge designs especially, we recommend writing up and publishing these experiences as a statistician-led stand-alone paper.
Otherwise, retrieving and identifying AD trials in the literature and clinical trial registers will be a major challenge for researchers and systematic reviewers [ 28 ]. We wrote this paper to encourage the wider use of ADs with pre-planned opportunities to make design changes in clinical trials. Although there are a few practical stumbling blocks on the way to a good AD trial, they can almost always be overcome with careful planning.