Friday, August 17, 2012

Confidence Intervals for difference between two proportions and for the ratio of two proportions


For clinical trials with binary outcomes, the results can usually be presented as a 2x2 contingency table as below:


Responder
Non-responder
Total
Treatment 1
n11
n12
n1
Treatment 2
n21
n22
n2

We can then calculate the proportion of responders for two treatment groups:

       p1=n11/n1

       p2=n21/n2

We have two ways to compare two treatment groups:
  • The difference between two proportions: p1-p2
  • The ratio of two proportions: p1/p2

p1-p2 may be called the absolute risk difference and p1/p2 is called relative risk (RR) or risk ratio. 

The confidence interval can be constructed for the difference between two proportions and for the relative risk.

For the difference between two proportions, the asymptotic confidence interval is ca1culated using the following formula:

                                 (p1-p2) +/- Z(alpha/2)*sqrt((p1 *(1-p1)/n1)+(p2*(1-p2)/n2))

Reference: Stokes, Davis, and Kock (2000) Categorical Data Analysis using the SAS System, 2nd edition

The notations may be different in the reference book and in SAS manual, but the results should be the same.

I had a posting a while ago about “Confidence Interval for Difference in Two Proportions” where I mentioned the corrections and the SAS codes.

For relative risk, the asymptotic confidence interval is calculated using the following formula:

Exp(log(RR) +/- Z(alpha/2) * sqrt((1-p1)/(n1*p1) + (1-p2)/(n2*p2)))

Reference: Agresti A (2007) An Introduction to Categorical Data Analysis, 2nd edition, JohnWiley & Sons, Inc.,

The notations may be different in the reference book and in SAS manual, but the results should be the same.

The confidence interval for relative risk can be obtained from SAS Proc Freq and can also be manually calculated using the formula above and the formula from SAS manual.

Suppose we have study results as below:


Success
Non-success
Total
Trt1
63
3
66
Trt2
56
13
69


data example;
  length trt $8;
  input trt $ success $ count;
  datalines;
  trt1   yes 63
  trt1   no  3
  trt2   yes 56
  trt2  no  13
;
proc freq data=example;
  weight count;
  tables trt*success/measures nopercent nocol;
  title 'outputs from SAS Proc Freq';
run;


data agresti;
  n11=63;
  n21=56;
  n1=66;
  n2=69;
  p1=n11/n1;
  p2=n21/n2;
  rr = p1/p2;
  v = (1-p1)/(n1*p1) + (1-p2)/(n2*p2);
  upper = exp(log(rr) - probit(0.025)*sqrt(v));
  lower = exp(log(rr) + probit(0.025)*sqrt(v));
run;
proc print data=agresti;
  title "using the formula from Agresti's book"
run;


data sasmanual;
  n11=63;
  n21=56;
  n1=66;
  n2=69;
  p1=n11/n1;
  p2=n21/n2;
  rr = p1/p2;
  v = (1-p1)/n11 + (1-p2)/n21;
  upper = rr * exp(-probit(0.025)*sqrt(v));
  lower = rr * exp(probit(0.025)*sqrt(v));
run;

proc print data=sasmanual;
  title "using the formula from SAS manual";
run;

I recently read a paper by Fischer et al. The confidence interval for relative risk was constructed using a method by Koopman. In Koopman’s paper “Confidence Intervals for the Ratio of Two Binomial Proportions”, a Chi-square method was proposed and the method required using numerical procedure and the iterative computations. There is no SAS program available for the calculation using Koopman's method.

There are other approaches proposed for computing confidence intervals for the ratio of two proportions. However, the method for calculating the asymptotic confidence interval adopted in SAS Proc Freq is commonly used.  

Further reading: