biostatistics and epidemiology with R: Biostatistics and R

Biostatistics is mainly statistics for clinical and epidemiological studies that studies the occurrence of illness (morbidity), death (mortality) in a point of time or in a course of time and finds various models and estimates of risks (probability of that event occurrence).

To obtain risk, it is a common practice to compare the diseased population with a population that is not diseased yet. Depending on the study design, it can be a comparison between exposed and unexposed subjects as well.

For simplicity, lets start with binary outcome variable (diseased / not diseased) and binary exposure or grouping variable (exposed / unexposed). A 2x2 contingency table will be appropriate to summarize the frequencies:

		Response or outcome
		Diseased	Not diseased	Marginal total
Group or exposure or predictive variable	Exposed	a = number of observations where exposure present and outcome present	b = number of observations where exposure present and outcome absent	n1 = number of observations where exposure present
Group or exposure or predictive variable	Unexposed	c = number of observations where exposure absent and outcome present	d = number of observations where exposure absent and outcome absent	n2 = number of observations where exposure absent
Marginal total		m1 = number of observations where outcome present	m2 = number of observations where outcome absent	N = total subjects in the study

Now the analysis of this table will depend on various aspects, such as:

objective of the study (usually the goal is for find a causal dependence from an observed association)
design of the study (randomized trial or observational)
involvement of other variables (covariate and confounder)
sample sizes
reliability of measurement

and so on.

Prevalence is the probability p in sample (pi in population) of being diseased in a specific point in time (cross-sectional study). Incidence is the same thing, but over a time period (follow-up); all patients being independent from baseline characteristics. Estimation of these can be done using Binomial.

For example, we have the following case control study result summary in a tabulated form:

Case-control study		cancer
		Diseased	Not diseased	Marginal total
smoking	Exposed	a = 41	b = 28	n1 = 69
smoking	Unexposed	c = 19	d = 32	n2 = 51
Marginal total		m1 = 60	m2 = 60	N = 120

Due to its sampling design, estimation risk can not be done directly. This is because the frequency of cancer occurring was determined by the design in advance. Also these studies are prone to confounding and hence can provide misleading results. Therefore, one has to be cautious about analyzing such data. Usually additional tools (such as stratification, etc) are used to handle them in a proper way.

There are various R tools that we can use to analyze various bio-statistical methods. Here are a few of them:

In the subsequent blogs, I plan to discuss a bit of use of R in Biostatistical context.

biostatistics and epidemiology with R

Wednesday, January 26, 2011

Biostatistics and R

No comments:

Post a Comment