A common class of problems in the accumulation and evaluation of scientific evidence is the assessment of association of two variables. Is there an association between poverty and drug addiction? Is emotional stress associated with cardiovascular disease?
To determine association, we must first quantify both variables. For instance, emotional stress may be quantified by using an appropriate psychological test of stress or by clearly defining, evaluating, and rating on a scale the stress factor in an individualโs life situation, whereas hypertension (defined as a blood pressure reading) may be considered as the particular aspect of cardiovascular disease to be studied.
When variables have been quantified, a measure of association needs to be calculated to determine the strength of the relationship. One of the most common measures of association is the correlation coefficient, r, which is a number derived from the data and which can vary between 1 and +1.
When r= +1, a perfect positive correlation, it means there is a direct relationship between the two variables: an individual who has a high score on one variable also has a high score on the other, and the score on one variable can be exactly predicted from the score on the other variable. This kind of correlation exists only in deterministic models, where there is really a functional relationship. An example might be the correlation between age of a tree and the number of rings it has.
A correlation coefficient of 1 indicates a perfect inverse relationship, where a high score on one variable means a low score on the other and where, as in perfect positive correlation, there is no error of measurement. Correlation coefficients between 0 and +1 and between 0 and 1 indicate varying strengths of associations. These correlation coefficients apply when the basic relationship between the two variables is linear. Consider a group of people for each of whom we have a measurement of weight against height; we will find that we can draw a straight line through the points. There is a linear association between weight and height, and the correlation coefficient would be positive but less than 1.
When the variables are continuous, the calculated correlation is the Pearson productโmoment correlation. If the variables are ranked and ordered according to rank, we calculate the Spearman rankโorder correlation, which is a nonparametric statistic. Nonparametric statistics are used when the data do not have to be normally distributed and are ordinal (i.e., can be sorted in order, but the distances between any two values do not have to be the same).
An 72 Biostatistics and Epidemiology: A Primer for Health Professionals example is educational level which can be categorized into less than a high school education, graduated from high school, some college, graduated from college, and received a graduate degree. You can assign numbers to these from 1 to 5, but the numbers do not represent years of education but rather categories of education that are ordered from lowest to highest category, and you can categorize them differently if you wish.
The diagrams in Fig 1 illustrate representations of various correlation coefficients.
Bibliographic reference
Sylvia Wassertheil Smoller. Biostatistics and Epidemiology. A Primer for Health and Biomedical Professionals Fourth Edition. Year 2015
๐บ. ๐ต๏ธ๐ต๏ธ. ๐ผ๐ผ
๐บ๐บ. ๐ต๏ธ๐ต๏ธ. ๐ผ๐ผ
๐บ๐บ๐บ. ๐ต๏ธ๐ต๏ธ. ๐ผ๐ผ
๐บ๐บ๐บ๐บ. ๐ต๏ธ๐ต๏ธ. ๐ผ๐ผ
๐บ๐บ๐บ๐บ๐บ๐บ๐บ๐บ๐บ๐บ.
๐บ๐บ๐บ๐บ. ๐ต๏ธ๐ต๏ธ. ๐ผ๐ผ
๐บ๐บ๐บ. ๐ต๏ธ๐ต๏ธ. ๐ผ๐ผ
๐บ๐บ. ๐ต๏ธ๐ต๏ธ. ๐ผ๐ผ
๐บ. ๐ต๏ธ๐ต๏ธ. ๐ผ๐ผ