T.test

You want to test whether two samples are drawn from populations with different means, or test whether one sample is drawn from a population with a mean different from some theoretical mean. t.test(x, y = NULL, alternative = c(“two.sided”, “less”, “greater”), mu = 0, paired = FALSE (paired t-test), var.equal = FALSE (uses the Welch t-test or Student’s t-test), conf.level = 0.95, …)

Data which show the effect of two soporific drugs (increase in hours of sleep compared to control) on 10 patients.

extra: increase in hours of sleep group: drug given ID : patient ID

Student’s t-test

Suppose the two groups are independently sampled

# t.test(extra ~ group, sleep) Welch t-test
t.test(extra ~ group, sleep, var.equal=TRUE)
## 
##  Two Sample t-test
## 
## data:  extra by group
## t = -1.8608, df = 18, p-value = 0.07919
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -3.363874  0.203874
## sample estimates:
## mean in group 1 mean in group 2 
##            0.75            2.33
# Not too significant indeed

Paired t-test

You might have observations before and after a treatment, or of two matched subjects with different treatments.

# with(sleep,
#      t.test(extra[group == 1],
#             extra[group == 2], paired = TRUE)) 
# t.test(extra ~ group, data = sleep,paired=TRUE)
# with(sleep,
# t.test(extra[group == 1] - extra[group == 2], mu=0, var.equal=TRUE)
# )
#   Paired t-test/One Sample t-test
# 
# data:  extra by group
# t = -4.0621, df = 9, p-value = 0.002833
# alternative hypothesis: true difference in means is not equal to 0
# 95 percent confidence interval:
#  -2.4598858 -0.7001142
# sample estimates:
# mean of the differences 
#                   -1.58 

with(sleep,
stripchart(extra[group == 2] - extra[group == 1], method = "stack", xlab = "hours",
           main = "Sleep prolongation (n = 10)")
)
with(sleep,
     boxplot(extra[group == 2] - extra[group == 1], horizontal = TRUE, add = TRUE,
        at = .6, pars = list(boxwex = 0.5, staplewex = 0.25))
)

MISC

Comparing a group against an expected population mean: one-sample t-test

Suppose that you want to test whether the data in column extra is drawn from a population whose true mean is 0.

t.test(sleep$extra, mu=0)
## 
##  One Sample t-test
## 
## data:  sleep$extra
## t = 3.413, df = 19, p-value = 0.002918
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  0.5955845 2.4844155
## sample estimates:
## mean of x 
##      1.54

Frequency tests

There are two general questions that frequency tests address:

  1. Does the frequency distribution differ from an expected, or theoretical, distribution (e.g., expect 50% yes, 50% no)? (Goodness-of-fit test)
  2. Does the frequency distribution differ between two or more groups? (Independence test)
Expected distribution Comparing groups
Exact Exact binomial Fisher’s exact
Approximate Chi-square goodness of fit Chi-square test of independence

Note: The exact binomial test can only be used on one variable that has two levels. Fisher’s exact test can only be used with two-dimensional contingency tables (for example, it can be used when there is one independent variable and one dependent variable, but not when there are 2 IVs and 1 DV).

To test for paired or within-subjects effects, McNemar’s test can be used. To use it, there must be one IV with two levels, and one DV with two levels.

To test for the independence of two variables with repeated measurements, the Cochran-Mantel-Haenszel test can be used.

Chi-square test

# chisq.test(c(5,7))
# data:  c(5, 7)
# X-squared = 0.33333, df = 1, p-value = 0.5637
M <- as.table(rbind(c(762, 327, 468), c(484, 239, 477)))
dimnames(M) <- list(gender = c("F", "M"),
                    party = c("Democrat","Independent", "Republican"))
#       party
# gender Democrat Independent Republican
#      F      762         327        468
#      M      484         239        477
chisq.test(M,p=c(0.3,0.3,0.4)) # Pearson’s chi-square test 
## 
##  Pearson's Chi-squared test
## 
## data:  M
## X-squared = 30.07, df = 2, p-value = 2.954e-07

exact binomial test

binom.test(c(682, 243), p = 3/4)
## 
##  Exact binomial test
## 
## data:  c(682, 243)
## number of successes = 682, number of trials = 925, p-value =
## 0.3825
## alternative hypothesis: true probability of success is not equal to 0.75
## 95 percent confidence interval:
##  0.7076683 0.7654066
## sample estimates:
## probability of success 
##              0.7372973

Fisher’s exact test

TeaTasting <-matrix(c(3, 1, 1, 3),nrow = 2,
       dimnames = list(Guess = c("Milk", "Tea"),
                       Truth = c("Milk", "Tea")))
fisher.test(TeaTasting, alternative = "greater")
## 
##  Fisher's Exact Test for Count Data
## 
## data:  TeaTasting
## p-value = 0.2429
## alternative hypothesis: true odds ratio is greater than 1
## 95 percent confidence interval:
##  0.3135693       Inf
## sample estimates:
## odds ratio 
##   6.408309

Cochran-Mantel-Haenszel test

The Cochran-Mantel-Haenszel test (or Mantel-Haenszel test) is used for testing the independence of two dichotomous variables with repeated measurements. It is most commonly used with 2x2xK tables, where K is the number of measurement conditions. For example, you may want to know whether a treatment (C vs. D) affects the likelihood of recovery (yes or no). Suppose, though, that the treatments were administered at three different times of day, morning, afternoon, and night – and that you want your test to control for this. The CMH test would then operate on a 2x2x3 contingency table, where the third variable is the one you wish to control for.

The implementation of the CMH test in R can handle dimensions greater than 2x2xK. For example, you could use it for a 3x3xK contingency table.

fish <- read.table(header=TRUE, text='
  Location Allele   Habitat Count
 tillamook     94    marine    56
 tillamook     94 estuarine    69
 tillamook non-94    marine    40
 tillamook non-94 estuarine    77
   yaquina     94    marine    61
   yaquina     94 estuarine   257
   yaquina non-94    marine    57
   yaquina non-94 estuarine   301
     alsea     94    marine    73
     alsea     94 estuarine    65
     alsea non-94    marine    71
     alsea non-94 estuarine    79
    umpqua     94    marine    71
    umpqua     94 estuarine    48
    umpqua non-94    marine    55
    umpqua non-94 estuarine    48
')
# Make a 3D contingency table, where the last variable, Location, is the one to 
# control for. If you use table() for case data, the last variable is also the 
# one to control for.
ct <- xtabs(Count ~ Allele + Habitat + Location, data=fish)
ct
## , , Location = alsea
## 
##         Habitat
## Allele   estuarine marine
##   94            65     73
##   non-94        79     71
## 
## , , Location = tillamook
## 
##         Habitat
## Allele   estuarine marine
##   94            69     56
##   non-94        77     40
## 
## , , Location = umpqua
## 
##         Habitat
## Allele   estuarine marine
##   94            48     71
##   non-94        48     55
## 
## , , Location = yaquina
## 
##         Habitat
## Allele   estuarine marine
##   94           257     61
##   non-94       301     57
# This prints ct in a "flat" format
ftable(ct)
##                  Location alsea tillamook umpqua yaquina
## Allele Habitat                                          
## 94     estuarine             65        69     48     257
##        marine                73        56     71      61
## non-94 estuarine             79        77     48     301
##        marine                71        40     55      57
# Print with different arrangement of variables
ftable(ct, row.vars=c("Location","Allele"), col.vars="Habitat")
##                  Habitat estuarine marine
## Location  Allele                         
## alsea     94                    65     73
##           non-94                79     71
## tillamook 94                    69     56
##           non-94                77     40
## umpqua    94                    48     71
##           non-94                48     55
## yaquina   94                   257     61
##           non-94               301     57
mantelhaen.test(ct)
## 
##  Mantel-Haenszel chi-squared test with continuity correction
## 
## data:  ct
## Mantel-Haenszel X-squared = 5.0497, df = 1, p-value = 0.02463
## alternative hypothesis: true common odds ratio is not equal to 1
## 95 percent confidence interval:
##  0.6005522 0.9593077
## sample estimates:
## common odds ratio 
##          0.759022

According to this test, there is a relationship between Allele and Habitat, controlling for Location, p=.025.

Note that the first two dimensions of the contingency table are treated the same (so their order can be swapped without affecting the test result), the highest-order dimension in the contingency table is different. This is illustrated below.

# The following two create different contingency tables, but have the same result
# with the CMH test
ct.1 <- xtabs(Count ~ Habitat + Allele + Location, data=fish)
ct.2 <- xtabs(Count ~ Allele + Habitat + Location, data=fish)
mantelhaen.test(ct.1)
## 
##  Mantel-Haenszel chi-squared test with continuity correction
## 
## data:  ct.1
## Mantel-Haenszel X-squared = 5.0497, df = 1, p-value = 0.02463
## alternative hypothesis: true common odds ratio is not equal to 1
## 95 percent confidence interval:
##  0.6005522 0.9593077
## sample estimates:
## common odds ratio 
##          0.759022
mantelhaen.test(ct.2)
## 
##  Mantel-Haenszel chi-squared test with continuity correction
## 
## data:  ct.2
## Mantel-Haenszel X-squared = 5.0497, df = 1, p-value = 0.02463
## alternative hypothesis: true common odds ratio is not equal to 1
## 95 percent confidence interval:
##  0.6005522 0.9593077
## sample estimates:
## common odds ratio 
##          0.759022
# With Allele last, we get a different result
ct.3 <- xtabs(Count ~ Location + Habitat + Allele, data=fish)
ct.4 <- xtabs(Count ~ Habitat + Location + Allele, data=fish)
mantelhaen.test(ct.3)
## 
##  Cochran-Mantel-Haenszel test
## 
## data:  ct.3
## Cochran-Mantel-Haenszel M^2 = 168.47, df = 3, p-value < 2.2e-16
mantelhaen.test(ct.4)
## 
##  Cochran-Mantel-Haenszel test
## 
## data:  ct.4
## Cochran-Mantel-Haenszel M^2 = 168.47, df = 3, p-value < 2.2e-16
# With Habitat last, we get a different result
ct.5 <- xtabs(Count ~ Allele + Location + Habitat, data=fish)
ct.6 <- xtabs(Count ~ Location + Allele + Habitat, data=fish)
mantelhaen.test(ct.5)
## 
##  Cochran-Mantel-Haenszel test
## 
## data:  ct.5
## Cochran-Mantel-Haenszel M^2 = 2.0168, df = 3, p-value = 0.5689
mantelhaen.test(ct.6)
## 
##  Cochran-Mantel-Haenszel test
## 
## data:  ct.6
## Cochran-Mantel-Haenszel M^2 = 2.0168, df = 3, p-value = 0.5689

McNemar’s test

McNemar’s test is conceptually like a within-subjects test for frequency data. For example, suppose you want to test whether a treatment increases the probability that a person will respond “yes” to a question, and that you get just one pre-treatment and one post-treatment response per person. A standard chi-square test would be inappropriate, because it assumes that the groups are independent. Instead, McNemar’s test can be used. This test can only be used when there are two measurements of a dichotomous variable. The 2x2 contingency table used for McNemar’s test bears a superficial resemblance to those used for “normal” chi-square tests, but it is different in structure.

data <- read.table(header=TRUE, text='
 subject time result
       1  pre      0
       1 post      1
       2  pre      1
       2 post      1
       3  pre      0
       3 post      1
       4  pre      1
       4 post      0
       5  pre      1
       5 post      1
       6  pre      0
       6 post      1
       7  pre      0
       7 post      1
       8  pre      0
       8 post      1
       9  pre      0
       9 post      1
      10  pre      1
      10 post      1
      11  pre      0
      11 post      0
      12  pre      1
      12 post      1
      13  pre      0
      13 post      1
      14  pre      0
      14 post      0
      15  pre      0
      15 post      1
')
library(tidyr)

data_wide <- spread(data, time, result)
ct <- table( data_wide[,c("pre","post")] )
ct
##    post
## pre 0 1
##   0 2 8
##   1 1 4
#>    post
#> pre 0 1
#>   0 2 8
#>   1 1 4

# The contingency table above puts each subject into one of four cells, depending
# on their pre- and post-treatment response. Note that it it is different from
# the contingency table used for a "normal" chi-square test, shown below. The table
# below does not account for repeated measurements, and so it is not useful for
# the purposes here.

# table(data[,c("time","result")])
#       result
# time    0  1
#   post  3 12
#   pre  10  5
mcnemar.test(ct)
## 
##  McNemar's Chi-squared test with continuity correction
## 
## data:  ct
## McNemar's chi-squared = 4, df = 1, p-value = 0.0455

For small sample sizes, it uses a continuity correction. Instead of using this correction, you can use an exact version of McNemar’s test, which is more accurate. It is available in the package exact2x2.

library(exact2x2)
#> Loading required package: exactci
#> Loading required package: ssanv
mcnemar.exact(ct)
## 
##  Exact McNemar test (with central confidence intervals)
## 
## data:  ct
## b = 8, c = 1, p-value = 0.03906
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##    1.072554 354.981246
## sample estimates:
## odds ratio 
##          8

Homogeneity of variance

You want test samples to see for homogeneity of variance (homoscedasticity) – or more accurately. Many statistical tests assume that the populations are homoscedastic. - Bartlett’s test - If the data is normally distributed, this is the best test to use. It is sensitive to data which is not non-normally distribution; it is more likely to return a “false positive” when the data is non-normal. - Levene’s test - this is more robust to departures from normality than Bartlett’s test. It is in the car package. - Fligner-Killeen test - this is a non-parametric test which is very robust against departures from normality.

For all these tests, the null hypothesis is that all populations variances are equal; the alternative hypothesis is that at least two of them differ.

head(InsectSprays)
##   count spray
## 1    10     A
## 2     7     A
## 3    20     A
## 4    14     A
## 5    14     A
## 6    12     A
tg      <- ToothGrowth
tg$dose <- factor(tg$dose) # Treat this column as a factor, not numeric
head(tg)
##    len supp dose
## 1  4.2   VC  0.5
## 2 11.5   VC  0.5
## 3  7.3   VC  0.5
## 4  5.8   VC  0.5
## 5  6.4   VC  0.5
## 6 10.0   VC  0.5
plot(count ~ spray, data = InsectSprays)

plot(len ~ interaction(dose,supp), data=ToothGrowth)

# On a first glance, it appears that both data sets are heteroscedastic, but this needs to be properly tested, which we’ll do below.

Bartlett’s test

With one independent variable:

bartlett.test(count ~ spray, data=InsectSprays)
## 
##  Bartlett test of homogeneity of variances
## 
## data:  count by spray
## Bartlett's K-squared = 25.96, df = 5, p-value = 9.085e-05

With multiple independent variables, the interaction() function must be used to collapse the IV’s into a single variable with all combinations of the factors. If it is not used, then the will be the wrong degrees of freedom, and the p-value will be wrong.

bartlett.test(len ~ interaction(supp,dose), data=ToothGrowth)
## 
##  Bartlett test of homogeneity of variances
## 
## data:  len by interaction(supp, dose)
## Bartlett's K-squared = 6.9273, df = 5, p-value = 0.2261
# The above gives the same result as testing len vs. dose alone, without supp
bartlett.test(len ~ dose, data=ToothGrowth)
## 
##  Bartlett test of homogeneity of variances
## 
## data:  len by dose
## Bartlett's K-squared = 0.66547, df = 2, p-value = 0.717

Levene’s test

With one independent variable:

library(car)

leveneTest(count ~ spray, data=InsectSprays)
## Levene's Test for Homogeneity of Variance (center = median)
##       Df F value   Pr(>F)   
## group  5  3.8214 0.004223 **
##       66                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

With two independent variables. Note that the interaction function is not needed, as it is for the other two tests.

leveneTest(len ~ supp*dose, data=tg)
## Levene's Test for Homogeneity of Variance (center = median)
##       Df F value Pr(>F)
## group  5  1.7086 0.1484
##       54

Fligner-Killeen test

# With one independent variable:

fligner.test(count ~ spray, data=InsectSprays)
## 
##  Fligner-Killeen test of homogeneity of variances
## 
## data:  count by spray
## Fligner-Killeen:med chi-squared = 14.483, df = 5, p-value =
## 0.01282

The fligner.test function has the same quirks as bartlett.test when working with multiple IV’s. With multiple independent variables, the interaction() function must be used.

fligner.test(len ~ interaction(supp,dose), data=ToothGrowth)
## 
##  Fligner-Killeen test of homogeneity of variances
## 
## data:  len by interaction(supp, dose)
## Fligner-Killeen:med chi-squared = 7.7488, df = 5, p-value = 0.1706
# The above gives the same result as testing len vs. dose alone, without supp
fligner.test(len ~ dose, data=ToothGrowth)
## 
##  Fligner-Killeen test of homogeneity of variances
## 
## data:  len by dose
## Fligner-Killeen:med chi-squared = 1.3879, df = 2, p-value = 0.4996

Inter-rater reliability

The method for calculating inter-rater reliability will depend on the type of data (categorical, ordinal, or continuous) and the number of coders.

library(irr)
# Loading required package: lpSolve

data(diagnoses)
dat <- diagnoses[,1:3]

Two raters: Cohen’s Kappa

# This will calculate Cohen’s Kappa for two coders – In this case, raters 1 and 2.

kappa2(dat[,c(1,2)], "unweighted")
##  Cohen's Kappa for 2 Raters (Weights: unweighted)
## 
##  Subjects = 30 
##    Raters = 2 
##     Kappa = 0.651 
## 
##         z = 7 
##   p-value = 2.63e-12

N raters: Fleiss’s Kappa, Conger’s Kappa

If there are more than two raters, use Fleiss’s Kappa.

kappam.fleiss(dat)
##  Fleiss' Kappa for m Raters
## 
##  Subjects = 30 
##    Raters = 3 
##     Kappa = 0.534 
## 
##         z = 9.89 
##   p-value = 0

It is also possible to use Conger’s (1980) exact Kappa. (Note that it is not clear to me when it is better or worse to use the exact method.)

kappam.fleiss(dat, exact=TRUE)
##  Fleiss' Kappa for m Raters (exact value)
## 
##  Subjects = 30 
##    Raters = 3 
##     Kappa = 0.55

Ordinal data: weighted Kappa

If the data is ordinal, then it may be appropriate to use a weighted Kappa. For example, if the possible values are low, medium, and high, then if a case were rated medium and high by the two coders, they would be in better agreement than if the ratings were low and high.

We will use a subset of the anxiety data set from the irr package.

library(irr)
data(anxiety)

dfa <- anxiety[,c(1,2)]

The weighted Kappa calculation must be made with 2 raters, and can use either linear or squared weights of the differences.

# Compare raters 1 and 2 with squared weights
kappa2(dfa, "squared")
##  Cohen's Kappa for 2 Raters (Weights: squared)
## 
##  Subjects = 20 
##    Raters = 2 
##     Kappa = 0.297 
## 
##         z = 1.34 
##   p-value = 0.18
# Use linear weights
kappa2(dfa, "equal")
##  Cohen's Kappa for 2 Raters (Weights: equal)
## 
##  Subjects = 20 
##    Raters = 2 
##     Kappa = 0.189 
## 
##         z = 1.42 
##   p-value = 0.157

Compare the results above to the unweighted calculation (used in the tests for non-ordinal data above), which treats all differences as the same:

kappa2(dfa, "unweighted")
##  Cohen's Kappa for 2 Raters (Weights: unweighted)
## 
##  Subjects = 20 
##    Raters = 2 
##     Kappa = 0.119 
## 
##         z = 1.16 
##   p-value = 0.245

Weighted Kappa with factors

The data above is numeric, but a weighted Kappa can also be calculated for factors. Note that the factor levels must be in the correct order, or results will be wrong.

# Make a factor-ized version of the data
dfa2 <- dfa
dfa2$rater1 <- factor(dfa2$rater1, levels=1:6, labels=LETTERS[1:6])
dfa2$rater2 <- factor(dfa2$rater2, levels=1:6, labels=LETTERS[1:6])
dfa2
##    rater1 rater2
## 1       C      C
## 2       C      F
## 3       C      D
## 4       D      F
## 5       E      B
## 6       E      D
## 7       B      B
## 8       C      D
## 9       E      C
## 10      B      C
## 11      B      B
## 12      F      C
## 13      A      C
## 14      E      C
## 15      B      B
## 16      B      B
## 17      A      A
## 18      B      C
## 19      D      C
## 20      C      D
# The factor levels must be in the correct order:
levels(dfa2$rater1)
## [1] "A" "B" "C" "D" "E" "F"
#> [1] "A" "B" "C" "D" "E" "F"
levels(dfa2$rater2)
## [1] "A" "B" "C" "D" "E" "F"
#> [1] "A" "B" "C" "D" "E" "F"
# The results are the same as with the numeric data, above
kappa2(dfa2, "squared")
##  Cohen's Kappa for 2 Raters (Weights: squared)
## 
##  Subjects = 20 
##    Raters = 2 
##     Kappa = 0.297 
## 
##         z = 1.34 
##   p-value = 0.18
# Use linear weights
kappa2(dfa2, "equal")
##  Cohen's Kappa for 2 Raters (Weights: equal)
## 
##  Subjects = 20 
##    Raters = 2 
##     Kappa = 0.189 
## 
##         z = 1.42 
##   p-value = 0.157

Continuous data: Intraclass correlation coefficient

When the variable is continuous, the intraclass correlation coefficient should be computed. From the documentation for icc:

When considering which form of ICC is appropriate for an actual set of data, one has take several decisions (Shrout & Fleiss, 1979):

  1. Should only the subjects be considered as random effects ("oneway" model, default) or are subjects and raters randomly chosen from a bigger pool of persons ("twoway" model).
  2. If differences in judges’ mean ratings are of interest, interrater "agreement" instead of "consistency" (default) should be computed.
  3. If the unit of analysis is a mean of several ratings, unit should be changed to "average". In most cases, however, single values (unit="single", default) are regarded.

We will use the anxiety data set from the irr package.

library(irr)
data(anxiety)
anxiety
##    rater1 rater2 rater3
## 1       3      3      2
## 2       3      6      1
## 3       3      4      4
## 4       4      6      4
## 5       5      2      3
## 6       5      4      2
## 7       2      2      1
## 8       3      4      6
## 9       5      3      1
## 10      2      3      1
## 11      2      2      1
## 12      6      3      2
## 13      1      3      3
## 14      5      3      3
## 15      2      2      1
## 16      2      2      1
## 17      1      1      3
## 18      2      3      3
## 19      4      3      2
## 20      3      4      2
# Just one of the many possible ICC coefficients
icc(anxiety, model="twoway", type="agreement")
##  Single Score Intraclass Correlation
## 
##    Model: twoway 
##    Type : agreement 
## 
##    Subjects = 20 
##      Raters = 3 
##    ICC(A,1) = 0.198
## 
##  F-Test, H0: r0 = 0 ; H1: r0 > 0 
##  F(19,39.7) = 1.83 , p = 0.0543 
## 
##  95%-Confidence Interval for ICC Population Values:
##   -0.039 < ICC < 0.494