Saturday, June 17, 2017

Statistical Analysis Plan in Clinical Trial Registries

Recently, a question comes up when I search the – the EU clinical trial registry website – the counterpart of in US. Should the clinical trial registries include the statistical analysis plan (for primary and secondary efficacy endpoints)? The statistical analysis plan could include the statistical methods for primary and secondary endpoints, missing data handling, stopping rule for early termination of the study, justification for sample size estimation, and so on.
For in US, Protocol Registration Data Element Definitions for Interventional and Observational Studies requires the inclusion of some details about statistical analyses:
Detailed Description Definition:
Extended description of the protocol, including more technical information (as compared to the Brief Summary), if desired. Do not include the entire protocol; do not duplicate information recorded in other data elements, such as Eligibility Criteria or outcome measures. Limit: 32,000 characters. 
For Patient Registries: Also describe the applicable registry procedures and other quality factors (for example, third party certification, on-site audit). In particular, summarize any procedures implemented as part of the patient registry, including, but not limited to the following: 
  • Quality assurance plan that addresses data validation and registry procedures, including any plans for site monitoring and auditing.
  • Data checks to compare data entered into the registry against predefined rules for range or consistency with other data fields in the registry.
  • Source data verification to assess the accuracy, completeness, or representativeness of registry data by comparing the data to external data sources (for example, medical records, paper or electronic case report forms, or interactive voice response systems).
  • Data dictionary that contains detailed descriptions of each variable used by the registry, including the source of the variable, coding information if used (for example, World Health Organization Drug Dictionary, MedDRA), and normal ranges if relevant.
  • Standard Operating Procedures to address registry operations and analysis activities, such as patient recruitment, data collection, data management, data analysis, reporting for adverse events, and change management.
  • Sample size assessment to specify the number of participants or participant years necessary to demonstrate an effect.
  • Plan for missing data to address situations where variables are reported as missing, unavailable, non-reported, uninterpretable, or considered missing because of data inconsistency or out-of-range results.
  • Statistical analysis plan describing the analytical principles and statistical techniques to be employed in order to address the primary and secondary objectives, as specified in the study protocol or plan. 
In EU, the is mainly based on the EudraCT database. As part of the clinical trial application (similar to IND in US), the sponsor needs to provide the clinical trial protocol information to be entered into EudraCT database.

In the guidance “Detailed guidance on the European clinical trials database (EUDRACT Database)”, it asks for the information regarding the clinical trial design, but there is no mention of the statistical analysis plan.

As a matter of fact, all clinical trial registries across different countries are supposed to meet the requirements by International Clinical Trials Registry Platform (ICTRP) from World Health Organization. In the list of elements for WHO Trial Registration Data Set , there is no mention of statistical analysis plan as part of the registration elements.

No matter what, there seems to be different understanding about the details of the clinical trial to be posted in clinical trial registries. Some companies posted very detail information including how the clinical trial data would be analyzed. Other companies were very restraint and posted as little information as possible.

In terms of the elements regarding the statistical analyses, there are actual more studies in with some details than studies in even though the requirement regarding the inclusion of the statistical analysis plan is mentioned in, not in For example, in a study “A Multicenter, Randomized, Double-Blind, Phase 3 Study of Ramucirumab (IMC-1121B) Drug Product and Best Supportive Care (BSC) Versus Placebo and BSC as Second-Line Treatment in Patients With Hepatocellular Carcinoma Following First-Line Therapy With Sorafenib”, a lot of details about the statistical analyses are provided in the

When I try to see if the interim analysis and its corresponding boundary method are mentioned in, I can clearly see the inconsistencies across different trial sponsors.

Here are some studies that the interim analysis and boundary method are mentioned.
Here are some studies that the interim analysis is mentioned, but the boundary method is not.

Sunday, June 04, 2017

Calculating exact confidence interval for binomial proportion within each group using the Clopper-Pearson method

Clopper-Pearson confidence interval is commonly used in calculating the exact confidence interval for binomial proportion, incidence rate,... The confidence interval is calculated for a single group, therefore Clopper-Pearson method is not for calculating the confidence interval for the difference between two groups. 

In many oncology studies where there is no concurrent control group. For response rate, The exact confidence interval will be constructed (usually through Clopper-Pearson method) and then the lower limit of the 95% confidence interval is compared with the historical rate to determine if there is a treatment effect. 

Here are some examples that Clopper-Pearson method was used to calculate the exact confidence interval: 

Medical and statistical review for Venetoclax NDA:
"For the primary efficacy analyses, statistical significance was determined by a two-sided p value less than 0.05 (one-sided less than 0.025). The assessment of ORR was performed once 70 subjects in the main cohort completed the scheduled 36-week disease assessment, progressed prior to the 36-week disease assessment, discontinued study drug for any reason, or after all treated subjects discontinued venetoclax, whichever was earlier. The ORR for venetoclax was tested to reject the null hypothesis of 40%. If the null hypothesis is rejected and the ORR is higher than 40%, then venetoclax has been shown to have an ORR significantly higher than 40%. The ninety-five percent (95%) confidence interval for ORR was based on binomial distribution (Clopper-Pearson exact method). "
Motzer et al (2015) Nivolumab versus Everolimus in Advanced Renal-Cell Carcinoma
"If superiority with regard to the primary end point was demonstrated, a hierarchical statistical testing procedure was followed for the objective response rate (estimated along with the exact 95% confidence interval with the use of the Clopper–Pearson method)"
Foster et al (2015) Sofosbuvir and Velpatasvir for HCV Genotype 2 and 3 Infection
"Point estimates and two-sided 95% exact confidence intervals that are based on the Clopper–Pearson method are provided for rates of sustained virologic response for all treatment groups, as well as selected sub-groups."
Cicardi et al (2010) Icatibant, a New Bradykinin-Receptor Antagonist, in Hereditary Angioedema
"Fisher’s exact test, with 95% confidence intervals calculated for each group by means of the Clopper–Pearson method, was used to compare the percentage of patients with clinically significant relief of the index symptom at 4 hours after the start of the study drug. Two-sided 95% confidence intervals for the difference in proportions were calculated with the use of the Anderson–Hauck correction."
According to SAS manual, the Clopper-Pearson confidence interval is described as below:
The confidence interval using Clopper-Pearson method can be easily calculated with SAS Proc Freq procedure. Alternatively, it can also be calculated directly using the formula or using R function. 

Using Venetoclax NDA as an example, the primary efficacy endpoint ORR (overall response rate) is calculated as 85 / 107 = 79.4. 95% confidence interval can be calculated using Clopper-Pearson method as following: 

Using SAS Proc Freq:  
With proc freq, we should get 95% confidence interval of 70.5 – 88.6.

data test2;
  input orr $ count @@;
have 85
no 22

proc freq data=test2 order=data;
  weight count;
  tables orr/binomial(exact) alpha=0.05 ;

Using formula:

data test;
  input n n1 alpha;
  phat = n1/n;
  fvalue1 = finv( (alpha/2), 2*n1, 2*(n-n1));
  fvalue2 = finv( (1-alpha/2), 2*(n1+1), 2*(n-n1));
  pL =  (1+   ((n-n1+1)/(n1*fvalue1) ))**(-1);
  pU =  (1+   ((n-n1)/((n1+1)*fvalue2) ))**(-1);
107 85 0.05

proc print;


Using R: 
f1=qf(1-alpha/2, 2*n1, 2*(n-n1+1), lower.tail=FALSE)
f2=qf(alpha/2, 2*(n1+1), 2*(n-n1), lower.tail=FALSE)