From the following 2x2 table, we want to infer about the population risks:
Lets say out of N subjects,
Since positive response from a particular individual follows Bernoulli, X = total number of + responses, would then be distributed as binomial P(x) = choose(N,x) pi^x (1-pi)^(N-x) with E(X) = N pi and var(X) = N pi (1-pi)
pi = Pr(+) = proportion of +' responses in the population
p = X/N = proportion of +'s in the population,
outcome | ||||
+ | - | Marginal total | ||
exposure | + | a | b | n1 |
- | c | d | n2 | |
Marginal total | m1 | m2 | N |
with E(p) = pi and var(p) = pi (1-pi)/N
Now, as N gets large, p approaches normal
N[mean = pi, variance = pi (1-pi)/N]
obtained from central limit theorem and Slutsky's convergence theorem.
Therefore, in large sample, to test H0: pi = pi0 against H1: pi =/= pi0, we have the following test statistic
z = (p - pi0) / sqrt( pi0*(1-pi0)/N )
and the corresponding CI (popularly known as wald-interval) is
p - z(alpha/2) * se(p), p + z(alpha/2) * se(p)
where se(p) = sqrt( pi0*(1-pi0)/N )
A simple example of CI construction is as follows:
Since se(p)^2 = var(p) comes from asymptotic likelihood theory, one should know that var(p) is just inverse of information matrix I(p). Then again, this I(p) is negative of first derivative of score function or negative of Hessian matrix.
Another simple application of proportion test can be shown using R:
> N = 28 # total number of subjects > x = 17 # number of positive responses > > # Asymptotic normal 95% CI > p = x/N > c(p - qnorm(1-0.05/2) * sqrt(p*(1-p)/N), p + qnorm(1-0.05/2) * sqrt(p*(1-p)/N)) [1] 0.4262457 0.7880401 > > #Asymptotic normal CIs from package > library("binom") > binom.confint(x, N, conf.level=0.95, methods="asymptotic") method x n mean lower upper 1 asymptotic 17 28 0.6071429 0.4262457 0.78804 |
Since se(p)^2 = var(p) comes from asymptotic likelihood theory, one should know that var(p) is just inverse of information matrix I(p). Then again, this I(p) is negative of first derivative of score function or negative of Hessian matrix.
Another simple application of proportion test can be shown using R:
> library(MASS) > data(quine) > quine Eth Sex Age Lrn Days 1 A M F0 SL 2 2 A M F0 SL 11 3 A M F0 SL 14 4 A M F0 AL 5 5 A M F0 AL 5 6 A M F0 AL 13 7 A M F0 AL 20 8 A M F0 AL 22 9 A M F1 SL 6 10 A M F1 SL 6 11 A M F1 SL 15 12 A M F1 AL 7 13 A M F1 AL 14 14 A M F2 SL 6 15 A M F2 SL 32 16 A M F2 SL 53 17 A M F2 SL 57 18 A M F2 AL 14 19 A M F2 AL 16 20 A M F2 AL 16 21 A M F2 AL 17 22 A M F2 AL 40 23 A M F2 AL 43 24 A M F2 AL 46 25 A M F3 AL 8 26 A M F3 AL 23 27 A M F3 AL 23 28 A M F3 AL 28 29 A M F3 AL 34 30 A M F3 AL 36 31 A M F3 AL 38 32 A F F0 SL 3 33 A F F0 AL 5 34 A F F0 AL 11 35 A F F0 AL 24 36 A F F0 AL 45 37 A F F1 SL 5 38 A F F1 SL 6 39 A F F1 SL 6 40 A F F1 SL 9 41 A F F1 SL 13 42 A F F1 SL 23 43 A F F1 SL 25 44 A F F1 SL 32 45 A F F1 SL 53 46 A F F1 SL 54 47 A F F1 AL 5 48 A F F1 AL 5 49 A F F1 AL 11 50 A F F1 AL 17 51 A F F1 AL 19 52 A F F2 SL 8 53 A F F2 SL 13 54 A F F2 SL 14 55 A F F2 SL 20 56 A F F2 SL 47 57 A F F2 SL 48 58 A F F2 SL 60 59 A F F2 SL 81 60 A F F2 AL 2 61 A F F3 AL 0 62 A F F3 AL 2 63 A F F3 AL 3 64 A F F3 AL 5 65 A F F3 AL 10 66 A F F3 AL 14 67 A F F3 AL 21 68 A F F3 AL 36 69 A F F3 AL 40 70 N M F0 SL 6 71 N M F0 SL 17 72 N M F0 SL 67 73 N M F0 AL 0 74 N M F0 AL 0 75 N M F0 AL 2 76 N M F0 AL 7 77 N M F0 AL 11 78 N M F0 AL 12 79 N M F1 SL 0 80 N M F1 SL 0 81 N M F1 SL 5 82 N M F1 SL 5 83 N M F1 SL 5 84 N M F1 SL 11 85 N M F1 SL 17 86 N M F1 AL 3 87 N M F1 AL 4 88 N M F2 SL 22 89 N M F2 SL 30 90 N M F2 SL 36 91 N M F2 AL 8 92 N M F2 AL 0 93 N M F2 AL 1 94 N M F2 AL 5 95 N M F2 AL 7 96 N M F2 AL 16 97 N M F2 AL 27 98 N M F3 AL 0 99 N M F3 AL 30 100 N M F3 AL 10 101 N M F3 AL 14 102 N M F3 AL 27 103 N M F3 AL 41 104 N M F3 AL 69 105 N F F0 SL 25 106 N F F0 AL 10 107 N F F0 AL 11 108 N F F0 AL 20 109 N F F0 AL 33 110 N F F1 SL 5 111 N F F1 SL 7 112 N F F1 SL 0 113 N F F1 SL 1 114 N F F1 SL 5 115 N F F1 SL 5 116 N F F1 SL 5 117 N F F1 SL 5 118 N F F1 SL 7 119 N F F1 SL 11 120 N F F1 SL 15 121 N F F1 AL 5 122 N F F1 AL 14 123 N F F1 AL 6 124 N F F1 AL 6 125 N F F1 AL 7 126 N F F1 AL 28 127 N F F2 SL 0 128 N F F2 SL 5 129 N F F2 SL 14 130 N F F2 SL 2 131 N F F2 SL 2 132 N F F2 SL 3 133 N F F2 SL 8 134 N F F2 SL 10 135 N F F2 SL 12 136 N F F2 AL 1 137 N F F3 AL 1 138 N F F3 AL 9 139 N F F3 AL 22 140 N F F3 AL 3 141 N F F3 AL 3 142 N F F3 AL 5 143 N F F3 AL 15 144 N F F3 AL 18 145 N F F3 AL 22 146 N F F3 AL 37 > table(quine$Eth, quine$Lrn) AL SL A 40 29 N 43 34 > prop.test(table(quine$Eth, quine$Lrn), correct=FALSE) 2-sample test for equality of proportions without continuity correction data: table(quine$Eth, quine$Lrn) X-squared = 0.0671, df = 1, p-value = 0.7956 alternative hypothesis: two.sided 95 percent confidence interval: -0.1395620 0.1820992 sample estimates: prop 1 prop 2 0.5797101 0.5584416 > prop.test(table(quine$Eth, quine$Lrn), correct=TRUE) 2-sample test for equality of proportions with continuity correction data: table(quine$Eth, quine$Lrn) X-squared = 0.0084, df = 1, p-value = 0.927 alternative hypothesis: two.sided 95 percent confidence interval: -0.1533019 0.1958390 sample estimates: prop 1 prop 2 0.5797101 0.5584416 |
Usually observed counts being larger than 5 is adequate for normality assumption. However, if this assumption is not satisfied, or worse, if any cell count is zero, i.e. X = 0; p = X/N = 0; se(p) = 0; which means this estimation method goes out of the window.
No comments:
Post a Comment