biostatistics and epidemiology with R: Inference for proportions in small samples

Large sample approximations do not work well in a small sample setting. Exact tests have to be developed for better estimation in that case.

Since we know X (positive cell count for response) follows binomial, without making normal approximation, we can go for exact inference, by calculating the confidence intervals from the quantiles of a binomial distribution.

		outcome
		+	-	Marginal total
exposure	+	a	b	n1
exposure	-	c	d	n2
Marginal total		m1	m2	N

Clopper Pearson showed that exact CI (lower limit and upper limit) can be computed by solving the following two equations:
sum_{a=0}^x Binom(1; N, pi) = alpha/2
sum_{a=x}^N Binom(1; N, pi) = alpha/2

An R computation of this can be done as follows:

> N = 28

> x= 17

> p = x/N

> The CI limits

> library("binom")

> binom.confint(x, N, conf.level=0.95, methods="exact")

method x n mean lower upper

1 exact 17 28 0.6071429 0.4057682 0.7849572

> # The test

> binom.test(x, N, p = 1/2)

Exact binomial test

data: x and N

number of successes = 17, number of trials = 28, p-value = 0.3449

alternative hypothesis: true probability of success is not equal to 0.5

95 percent confidence interval:

0.4057682 0.7849572

sample estimates:

probability of success

0.6071429

The coverage probability of both normal approximation (large sample approximation) and exact method can be computed by R simulation as follows:

> pi = 0.1 # True parameter

> sim = 10000 # number of data points

> data = rbinom(sim, N, prob=pi)

> ci.normal = binom.confint(data, N, conf.level=0.95, methods="asymptotic")

> ci.exact = binom.confint(data, N, conf.level=0.95, methods="exact")

> cover.normal = mean(ci.normal[c("lower")] <= pi & pi <= ci.normal[c("upper")])

> cover.normal

[1] 0.9403

> cover.exact = mean(ci.exact[c("lower")] <= pi & pi <= ci.exact[c("upper")])

> cover.exact

[1] 0.9817

Note that die to discreteness of binomial distribution, the coverage probability is wider than needs to be.

biostatistics and epidemiology with R

Wednesday, January 26, 2011

Inference for proportions in small samples

No comments:

Post a Comment