biostatistics and epidemiology with R: Why log(RR) instead of RR?

As an example of how the variances of difference measures are calculated, the derivation of approximate variance or SE for RR is shown as follows (uses multivariate delta method):

Because of the skewed distribution of the untransformed estimated risk ratio it was advantageous to use logarithm. To show this, let us show the coverage rates first for a study where only 50 subjects are available in each groups, with pi1 = .2 and pi2 = .1 respectively:

> # given values

> n1 = 50

> n2 = 50

> pi1 = .2

> pi2 = .1

> # true RR

> t.pi = pi1/pi2

> # generate sample RR and calculate LRR and their C.I.s

> sim = 1000

> set.seed(124)

> x1 = rbinom(n = sim, size = n1, prob = pi1)

> p1 = x1/n1

> x2 = rbinom(n = sim, size = n2, prob = pi2)

> p2 = x2/n2

> RR = p1/p2

> LRR = log(RR)

> v.RR = (p1/p2)^2 * ( (1-p1)/(p1*n1) + (1-p2)/(p2*n2) )

> v.LRR = ( (1-p1)/(p1*n1) + (1-p2)/(p2*n2) )

> ci.RR = cbind(RR - qnorm(.975)*sqrt(v.RR), RR + qnorm(.975)*sqrt(v.RR))

> ci.LRR =cbind(LRR - qnorm(.975)*sqrt(v.LRR), LRR + qnorm(.975)*sqrt(v.LRR))

> # Find the coverage rates for RR and Log(RR)

> covr.RR = ci.RR[,1] <= t.pi & ci.RR[,2] >= t.pi

> sum(covr.RR[!is.na(covr.RR)])/sim

[1] 0.914

> covr.LRR = ci.LRR[,1] <= log(t.pi) & ci.LRR[,2] >= log(t.pi)

> sum(covr.LRR[!is.na(covr.LRR)])/sim

[1] 0.968

As we see, the coverage rates for actual RR estimates are lower than expected. Since logRR approximately follows normal, this was not a problem for it. Now, to actually achieve normality for estimates of RR, let us investigate how much sample size is required. Trying various sample sizes, it was decided to go for sample size as large as 8,000! Here is the code:

> # sample size guessing

> n1 = n2 = 8000

> # generate sample RR and calculate LRR and their C.I.s

> sim = 1000

> set.seed(124)

> x1 = rbinom(n = sim, size = n1, prob = pi1)

> p1 = x1/n1

> x2 = rbinom(n = sim, size = n2, prob = pi2)

> p2 = x2/n2

> RR = p1/p2

> hist(RR)

> require("fBasics")

Loading required package: fBasics

Loading required package: MASS

Loading required package: timeDate

Loading required package: timeSeries

Attaching package: 'fBasics'

The following object(s) are masked from 'package:base':

norm

> shapiroTest(RR)

Title:

Shapiro - Wilk Normality Test

Test Results:

STATISTIC:

W: 0.9972

P VALUE:

0.07717

Description:

Wed Jan 26 18:36:18 2011 by user: Ehsan

> jarqueberaTest(RR)

Title:

Jarque - Bera Normalality Test

Test Results:

STATISTIC:

X-squared: 5.9183

P VALUE:

Asymptotic p Value: 0.05186

Description:

Wed Jan 26 18:36:18 2011 by user: Ehsan

> cvmTest(RR)

Title:

Cramer - von Mises Normality Test

Test Results:

STATISTIC:

W: 0.0846

P VALUE:

0.1802

Description:

Wed Jan 26 18:36:18 2011 by user: Ehsan

> lillieTest(RR)

Title:

Lilliefors (KS) Normality Test

Test Results:

STATISTIC:

D: 0.0232

P VALUE:

0.2121

Description:

Wed Jan 26 18:36:18 2011 by user: Ehsan

> pchiTest(RR)

Title:

Pearson Chi-Square Normality Test

Test Results:

PARAMETER:

Number of Classes: 32

STATISTIC:

P: 33.088

P VALUE:

Adhusted: 0.2742

Not adjusted: 0.3655

Description:

Wed Jan 26 18:36:18 2011 by user: Ehsan

> sfTest(RR)

Title:

Shapiro - Francia Normality Test

Test Results:

STATISTIC:

W: 0.9975

P VALUE:

0.1165

Description:

Wed Jan 26 18:36:18 2011 by user: Ehsan

> qqnorm(RR)

Now, following R codes can be used to compare coverage rates and mean confidence interval widths between the two methods for sample size 8,000:

> # chosen values

> n1 = n2 = 8000

> # generate sample RR and calculate LRR and their C.I.s

> sim = 1000

> set.seed(124)

> x1 = rbinom(n = sim, size = n1, prob = pi1)

> p1 = x1/n1

> x2 = rbinom(n = sim, size = n2, prob = pi2)

> p2 = x2/n2

> RR = p1/p2

> LRR = log(RR)

> v.RR = (p1/p2)^2 * ( (1-p1)/(p1*n1) + (1-p2)/(p2*n2) )

> v.LRR = ( (1-p1)/(p1*n1) + (1-p2)/(p2*n2) )

> ci.RR = cbind(RR - qnorm(.975)*sqrt(v.RR), RR + qnorm(.975)*sqrt(v.RR))

> ci.LRR =cbind(LRR - qnorm(.975)*sqrt(v.LRR), LRR + qnorm(.975)*sqrt(v.LRR))

> # Find the coverage rates for RR and Log(RR)

> covr.RR = ci.RR[,1] <= t.pi & ci.RR[,2] >= t.pi

> sum(covr.RR[!is.na(covr.RR)])/sim

[1] 0.959

> covr.LRR = ci.LRR[,1] <= log(t.pi) & ci.LRR[,2] >= log(t.pi)

> sum(covr.LRR[!is.na(covr.LRR)])/sim

[1] 0.961

> # mean confidence interval widths

> mean(ci.RR[,2] - ci.RR[,1])

[1] 0.3159202

> mean(ci.LRR[,2] - ci.LRR[,1])

[1] 0.1580062

Therefore, we see, why use of logRR is so popular, since under normality assumption due to its symmetry works out well even for small sample sizes.

1 comment:

AnonymousSeptember 2, 2019 at 10:58 PM
I am 29 years old and have been diagnosed with breast cancer, ease of treatment and a similar story, except for my first acceptance as a rejection of herbal medicine. I was not part of the Perseid movement and did not really build relationships with any of them, I just believed in their operation. I say this because it was during the use of Dr. Itua herbal medicine that I now attest that herbal medicine is real, the phytotherapy Dr. Itua heal my breast cancer which I suffered for 2 years. Dr. Itua herbal medicine is made of natural herbs, with no side effects, and easy to drink. If you have the same breast cancer or any type of human illness, including HIV / AIDS, herpes,Ovarian Cancer,Pancratics cancer,bladder cancer, prostate cancer, Glaucoma., Cataracts,Macular degeneration,Cardiovascular disease,Autism,Lung disease.Enlarged prostate,Osteoporosis.Alzheimer's disease,Brain cancer,Esophageal cancer,Gallbladder cancer,Gestational trophoblastic disease,Head and neck cancer,Intestinal cancer,Kidney cancer,Melanoma,Mesothelioma,Multiple myeloma,Neuroendocrine tumors,Hodgkin lymphoma,Oral cancer,Ovarian cancer,Sinus cancer,,Soft tissue sarcoma,Spinal cancer,Stomach cancer,Testicular cancer,Throat cancer,Thyroid Cancer,Uterine cancer,Vaginal cancer,Vulvar cancer.psoriasis ,Tach Diseases,Lupus,Endomertil Cancer, cerebrovascular diseases,
Dementia.lung cancer,skin cancer,testicular Cancer,Lupus, , LEUKEMIA, VIRUSES, HEPATITIS, INFERTILITY WOMEN / MAN,CONTACT EMAIL / WHATSAPP: Or drituaherbalcenter@gmail.com/ +2348149277967

Wednesday, January 26, 2011

Why log(RR) instead of RR?

1 comment: