April \(30^{th}\) 2020

\(243^{th}\) Gauss' Birthday!

Eh??

Why Sample Size is so important?

  • It says when you should do your stats
  • It gives an idea of the actual resources we should put into a project
  • It tells us when the numerosity of our sample is actually representative of the statistical population

In census the sample size is the whole population!

What is a sample size?

Generally speaking, the larger the sample size, the more accurate are your estimates and your statistics.

However we have a lot of limitations: availability of participants, time contraints, money constraints, etc…

Therefore the optimum sample size is the minimum number of participants needed to have statistically representative data.

Power of a statistical test

In frequentist statistics (NHST), some fundamental concepts are:

  • first type error (\(\alpha\)) – typically set at 5% (0.05) (False Positives)
  • second type error (\(\beta\)) (False Negatives)

The power of a statistical test is given by \(1-\beta\).

It is the probability to reject the null hypothesis when it is false.

Generally, accepted values of power are: 80%, 90%, 95% and 99%, which means \(\beta = 20\%, 10\%, 5\%, 1\%\).

The sample size is the number of observations required, to obtain a statistically significant result in the power% of cases, given a specific effect size.

Effect size of a statistical test

First introduced by Cohen 1988, they demonstrated their importance in short time.

The effect size of a test is a (usually) standardized measure of the dimension of an effect (difference between groups, covariation, etc…) that should always been presented altogether with a (significant) p-value.

If the p-value says that there is an effect, the effect size says us if the effect is worth to be mentioned.

The effect size is an indication of how much of the control observation is lower (or higher) than the experimental observations.

Effect size and % of difference

Let's use the Cohen's \(d = \frac{\hat{\mu}_1 - \hat{\mu}_2}{\hat{\sigma}_{12}} \leftarrow\) z-score.

Nominal size Effect size % control < experimental observations
0.0 50%
Small 0.2 58%
Medium 0.5 69%
Large 0.8 79%
1.4 92%

https://rpsychologist.com/d3/cohend/ \(\leftarrow\) interesting visualization

Sample Size \(n = f(1-\beta,ES)\) 1/3

The sample size is computed as a function of the (desired) power (\(1-\beta\)) and effect size (\(ES\)).

It is the optimum sample size needed to have the desired power, specifying a certain effect size.

Taking for example the case of a single sample and one-tail t-test, with the Cohen's \(d\) as effect size, it solves for \(n\) the following system:

\[ \begin{cases} Pr_t(q,\nu,c) = 1-\beta\\ \nu = (n-1)\\q = Q_t(\alpha , \nu)\\c = \sqrt{n}\times d \end{cases} \]

Sample Size \(n = f(1-\beta,ES)\) 2/3

\(\nu = n-1\) these are the degrees of freedom of the Student's t distribution

\(q = Q_t(\alpha,\nu)\) this is the quantile of the t distribution at \(\alpha\) (usually 0.05) with these d.o.f.

\(c = \sqrt{n}\times d\) is the non-centrality parameter of the t distribution. Larger the sample, more deviated is the distribution.

\(Pr_t(q, \nu, c)\) is the integrate of the curve of the t distribution stopped at quantile \(q\).

Sample Size \(n = f(1-\beta,ES)\) 3/3

EXAMPLE

\(d = 0.3\); \(1-\beta = 0.8\); \(\alpha = 0.05\)

\(n = 70\)

\(\nu = 69\)

\(q = Q_t(\alpha,\nu) = Q_t(0.05,69) \simeq 1.67\)

\(c = \sqrt{n}\times d = \sqrt{69}\times 0.3 \simeq 8.31\times0.3 \simeq 2.49\)

\(Pr_t(q, \nu, c) = Pr_t(1.67, 69, 2.49) = 0.7932814\)

Sample Size \(n = f(1-\beta,ES)\) *

  • Values are mirrored.

A question for you:

Recalling that: the sample size is the number of observations required, to obtain a statistically significant result in the power% of cases, given a specific effect size.

Do you think that if I reach a statistically significant result with less observations than those required by the sample size, it is OK?

Why?

My answer

No, it is not OK.

The motivation is that the sample size if the minimum number of observations (subjects) required to have a representative sample of the statistical population.

Smaller samples can be more prone to (less obvious) outliers, therefore there is not only the risk to have a greater second type error, but also a first type error.

How we can compute a sample size?

Here I present three main ways:

  • By means of the functions first proposed by Cohen (1988)
  • By means of the functions proposed by Chow SC, Shao J, Wang H. (2008)
  • By means of simulations

When we compute the sample size, we need to think to our hypotheses and to the specific contrasts of interest.

We also need to think to the possible effect size.

How to determine it?

  • pilot data
  • data in literature
  • standard tables

The pwr package

How we can compute a sample size with Cohen's formulas?

In order to use the Cohen's formular, there is the package pwr in R.

In this package there are several functions that allow us to compute the sample size in different cases.

Cohen's formulae in R 1/2

function power calculations for ES standard values
pwr.2p.test two proportions (equal n) h - S: 0.2; M: 0.5; L: 0.8
pwr.2p2n.test two proportions (unequal n) h - S: 0.2; M: 0.5; L: 0.8
pwr.anova.test balanced one way ANOVA f - S: 0.1; M: 0.25; L: 0.4
pwr.chisq.test chi-square test w - S: 0.1; M: 0.3; L: 0.5
pwr.f2.test general linear model f2 - S: 0.02; M: 0.15; L: 0.35

Cohen's formulae in R 2/2

function power calculations for ES standard values
pwr.p.test proportion (one sample) h - S: 0.2; M: 0.5; L: 0.8
pwr.r.test correlation r - S: 0.1; M: 0.3; L: 0.5
pwr.t.test t-tests (one sample, 2 sample, paired) d - S: 0.2; M: 0.5; L: 0.8
pwr.t2n.test t-test (two samples with unequal n) d - S: 0.2; M: 0.5; L: 0.8

with the function cohen.ES you can have all the standard effect sizes.

Structure of the functions in the pwr package

More or less, all the pwr.?.test have the same parameter to be specified:

  • power: a value within 0 and 1, stating the desired power
  • sig.level: the \(\alpha\). If you do not specify it, by default is = 0.05
  • alternative: "two.sided", "greater", "less"
  • (different for each function): the effect size
  • (only in some function) type: "two.sample" (comparing two independent groups), "one.sample" (one group against \(\mu\)), "paired" (comparing the same group in two different times)

Using the pwr package

Let's compute the sample size previously seen

library(pwr)

pwr.t.test(d = 0.3, power = 0.8, sig.level = 0.05,
           type = "one.sample", alternative = "greater")
## 
##      One-sample t test power calculation 
## 
##               n = 70.06791
##               d = 0.3
##       sig.level = 0.05
##           power = 0.8
##     alternative = greater

pwr::pwr.t.test 1/3

The parameters are:

  • d: the effect size
  • sig.level
  • power
  • type: "two.sample", "one.sample", "paired"
  • alternative: "two.sided", "less", "greater"

pwr::pwr.t.test 2/3

EXAMPLE

We want to train black and white cats in jumping at a command.

If our hypothesis is that black cats will jump more, how many cats we have to train?

Let's use a power of 80%, and a medium effect size.

pwr::pwr.t.test 3/3

EXAMPLE

## 
##      Two-sample t test power calculation 
## 
##               n = 50.1508
##               d = 0.5
##       sig.level = 0.05
##           power = 0.8
##     alternative = greater
## 
## NOTE: n is number in *each* group

pwr::pwr.anova.test 1/3

The function is pwr.anova.test, and it works only with one-way balanced designs.

That means that if we have a multifactorial design, we have to force it as a one-way design.

The null hypothesis is that in all levels of the factor the means are equal, the alternative hypothesis is that there are at least two levels that are statistically different from each other.

pwr::pwr.anova.test 2/3

EXAMPLE

We have an "Empathy for pain" experiment. In this experiment, we collect physiological data during the vision of 3 typologies of videos: - Control video, A video of a Syringe penetrating an hand, A video of a Q-tip touching a hand In two perspectives: - 1PP, 3PP The colour of the Q-tip or syringe can change: - Blue, green, pink

Therefore, the design is a \(3\times2\times3\). If we translate it as a one.way design, the total number of "groups" (k) is \(18\).

However, our hypothesis is that there is a difference between the Syringe videos in the 1PP and 3PP, therefore touching two factors: \(3\times2\). The total number of groups to take into account is \(6\).

pwr::pwr.anova.test 3/3

EXAMPLE

Let's use a medium effect size \(f = 0.25\)

pwr.anova.test(k = 6, f = 0.25, power = 0.8, sig.level = 0.05)
## 
##      Balanced one-way analysis of variance power calculation 
## 
##               k = 6
##               n = 35.14095
##               f = 0.25
##       sig.level = 0.05
##           power = 0.8
## 
## NOTE: n is number in each group

pwr::pwr.f2.test 1/3

The function is pwr.f2.test.

Formally, it is for multiple regressions, but it does the same things seen in pwr.anova.test with much more flexibility.

You do not need to force your design into a "one-way" study, and you can take into account covariates.

The parameter that takes this function are:

  • u: degrees of freedom for numerator
  • v: degrees of freedom for denominator
  • f2: effect size
  • sig.level
  • power

pwr::pwr.f2.test 2/3

You need to give to the function the number of degrees of freedom for the numerator. In this way the function will return the number of degrees of freedom of the demoninator. From this value we can estimate the sample size.

\(u\) is equal to the number of levels of the factor/interaction minus 1. \(v\) is the degrees of freedom of the residuals.

The sample size is \(u + v + 1\).

pwr::pwr.f2.test 3/3

EXAMPLE

At the previous study, we add the evaluation of embodiment on a likert scale of the hand seen in the videos.

The d.o.f. at the numerator are now: \(2\times3\times1 - 1\)

Let's use a medium effect size \(f2 = 0.15\)

pwr.f2.test(u = 5, f2 = 0.15, power = 0.8, sig.level = 0.05)
## 
##      Multiple regression power calculation 
## 
##               u = 5
##               v = 85.21369
##              f2 = 0.15
##       sig.level = 0.05
##           power = 0.8

Exercises A 1/3

  1. Compute the required sample size to understand if females are more creative than males, using a small effect size and a power = 80%.

  2. Estimate the sample size necessary to understand if people feel more energic after a hot shower, with a moderate effect size and a power = 90%.

  3. Repeat, with a small effect size and power = 80%.

Exercises A 2/3

  1. We have three groups of people: volleyball players, basketball players and normal people. Your hypothesis is that normal people are less reactive in a go-no-go experiment. Find the total number of partecipants required with a medium effect size and a power of 99%.

Exercises A 3/3

  1. In an experiment concerning memory we have three groups: the coffee group, the tea group and the water group, and two between-subjects conditions: sleeping deprivation and normal sleeping. Find the sample size per group, with a power of 80% and a large effect size.

The TrialSize package

How we can compute a sample size with Chao et al.'s formulas?

We can use the TrialSize package.

It has 80 different functions to compute the sample size.

Some of interest are:

  • ANOVA.Repeat.Measure
  • CrossOver.ISV.Equality
  • CrossOver.ISV.NIS

TrialSize::ANOVA.Repeat.Measure 1/2

\(H_0: \forall \mu_{i} = \mu_{(j:k) \neq i}\); \(H_1: \exists \mu_{i} \neq \mu_{(j:k) \neq i}\)

This function needs:

  • alpha
  • beta: be careful, the power is \(1-\beta\). If you want power = \(80\%\), beta = \(0.2\)
  • sigma: the sum of the variance components
  • delta: the difference that we consider as meaningful
  • m: the total number of Bonferroni adjustments needed for post-hoc tests

No space for standard ES!

Suggestion: in terms of Cohen's d, delta = ES \(\times \sqrt{sigma}\)

TrialSize::ANOVA.Repeat.Measure 2/2

EXAMPLE

We use again the same example seen before.

Our physiological data have been transformed into z-scores, therefore a meaningful difference may be 1.5

We set the sum of the variances as \(\sqrt{delta\div ES^2}\) = \(\sqrt{1.5\div0.5^2} \simeq 2.45\)

s.size <- ANOVA.Repeat.Measure(alpha = 0.05, beta = 0.2, sigma = 2.45,
                               delta = 1.5, m = 6)
s.size
## [1] 64.6112

TrialSize::CrossOver.ISV.Equality 1/3

\(sigma^T\) is the within-subjects variance for treatment T

\(H_0: \forall \sigma^{T^1}_{1} = \sigma^{T^2}_{2}\); \(H_1: \exists \sigma^{T^1}_{1} \neq \sigma^{T^2}_{2}\)

This function needs:

  • alpha
  • beta: be careful, the power is \(1-\beta\). If you want power = \(80\%\), beta = \(0.2\)
  • sigma1: within-subject variance of treatment 1
  • sigma2: within-subject variance of treatment 1
  • m: for each subject, there are m replicates

Suggestion: think sigmas in terms of percentages or z-scores

TrialSize::CrossOver.ISV.Equality 2/3

EXAMPLE

Cross-Over design with treatment and placebo condition, 5 trainings per week, done for two weeks.

Data are in z-scores

CrossOver.ISV.Equality(alpha = 0.05, beta = 0.2,
                                 sigma1 = 1, sigma2 = 2, m = 10)
## [1] 2.0000000 0.2573179
## [1] 3.0000000 0.3880039
## [1] 4.0000000 0.4632526
## [1] 5.0000000 0.5143397
## [1] 6.0000000 0.5521888
## [1] 7.0000000 0.5818003
## [1] 8.0000000 0.6058496
## [1] 9.0000000 0.6259223
## [1] 10.0000000  0.6430289
## [1] 11.0000000  0.6578498
## [1] 12.0000000  0.6708628
## [1] 13.0000000  0.6824154
## [1] 14.000000  0.692767
## [1] 15.0000000  0.7021163
## [1] 16.0000000  0.7106183
## [1] 17.000000  0.718396
## [1] 18.0000000  0.7255486
## [1] 19.0000000  0.7321571
## [1] 20.0000000  0.7382884
## [1] 21.0000000  0.7439981
## [1] 22.0000000  0.7493334
## [1] 23.0000000  0.7543342
## [1] 24.0000000  0.7590344
## [1] 25.0000000  0.7634637
## [1] 26.0000000  0.7676474
## [1] 27.0000000  0.7716077
## [1] 28.0000000  0.7753643
## [1] 29.0000000  0.7789342
## [1] 30.0000000  0.7823326
## [1] 31.0000000  0.7855731
## [1] 32.0000000  0.7886677
## [1] 33.0000000  0.7916271
## [1] 34.0000000  0.7944611
## [1] 35.0000000  0.7971784
## [1] 36.0000000  0.7997869
## [1] 37.0000000  0.8022938
## [1] 38.0000000  0.8047056
## [1] 39.0000000  0.8070282
## [1] 40.0000000  0.8092671
## [1] 41.0000000  0.8114273
## [1] 42.0000000  0.8135133
## [1] 43.0000000  0.8155293
## [1] 44.0000000  0.8174792
## [1] 45.0000000  0.8193666
## [1] 46.0000000  0.8211948
## [1] 47.0000000  0.8229669
## [1] 48.0000000  0.8246857
## [1] 49.0000000  0.8263538
## [1] 50.0000000  0.8279738
## [1] 51.0000000  0.8295479
## [1] 52.0000000  0.8310783
## [1] 53.000000  0.832567
## [1] 54.0000000  0.8340158
## [1] 55.0000000  0.8354267
## [1] 56.0000000  0.8368011
## [1] 57.0000000  0.8381406
## [1] 58.0000000  0.8394468
## [1] 59.000000  0.840721
## [1] 60.0000000  0.8419644
## [1] 61.0000000  0.8431784
## [1] 62.0000000  0.8443641
## [1] 63.0000000  0.8455226
## [1] 64.0000000  0.8466548
## [1] 65.0000000  0.8477619
## [1] 66.0000000  0.8488447
## [1] 67.0000000  0.8499041
## [1] 68.000000  0.850941
## [1] 69.0000000  0.8519561
## [1] 70.0000000  0.8529502
## [1] 71.000000  0.853924
## [1] 72.0000000  0.8548783
## [1] 73.0000000  0.8558136
## [1] 74.0000000  0.8567306
## [1] 75.0000000  0.8576299
## [1] 76.0000000  0.8585121
## [1] 77.0000000  0.8593776
## [1] 78.0000000  0.8602271
## [1] 79.000000  0.861061
## [1] 80.0000000  0.8618798
## [1] 81.0000000  0.8626839
## [1] 82.0000000  0.8634738
## [1] 83.0000000  0.8642499
## [1] 84.0000000  0.8650126
## [1] 85.0000000  0.8657623
## [1] 86.0000000  0.8664994
## [1] 87.0000000  0.8672241
## [1] 88.0000000  0.8679369
## [1] 89.0000000  0.8686381
## [1] 90.000000  0.869328
## [1] 91.0000000  0.8700069
## [1] 92.0000000  0.8706751
## [1] 93.0000000  0.8713328
## [1] 94.0000000  0.8719804
## [1] 95.0000000  0.8726181
## [1] 96.0000000  0.8732461
## [1] 97.0000000  0.8738647
## [1] 98.0000000  0.8744742
## [1] 99.0000000  0.8750747
## [1] 100.0000000   0.8756665
## [1] 101.0000000   0.8762498
## [1] 102.0000000   0.8768248
## [1] 103.0000000   0.8773916
## [1] 104.0000000   0.8779506
## [1] 105.0000000   0.8785018
## [1] 106.0000000   0.8790454
## [1] 107.0000000   0.8795817
## [1] 108.0000000   0.8801107
## [1] 109.0000000   0.8806327
## [1] 110.0000000   0.8811478
## [1] 111.0000000   0.8816561
## [1] 112.0000000   0.8821579
## [1] 113.0000000   0.8826531
## [1] 114.0000000   0.8831421
## [1] 115.0000000   0.8836249
## [1] 116.0000000   0.8841016
## [1] 117.0000000   0.8845724
## [1] 118.0000000   0.8850374
## [1] 119.0000000   0.8854967
## [1] 120.0000000   0.8859504
## [1] 121.0000000   0.8863987
## [1] 122.0000000   0.8868416
## [1] 123.0000000   0.8872793
## [1] 124.0000000   0.8877118
## [1] 125.0000000   0.8881393
## [1] 126.0000000   0.8885618
## [1] 127.0000000   0.8889796
## [1] 128.0000000   0.8893925
## [1] 129.0000000   0.8898008
## [1] 130.0000000   0.8902046
## [1] 131.0000000   0.8906038
## [1] 132.0000000   0.8909986
## [1] 133.0000000   0.8913891
## [1] 134.0000000   0.8917754
## [1] 135.0000000   0.8921575
## [1] 136.0000000   0.8925355
## [1] 137.0000000   0.8929094
## [1] 138.0000000   0.8932795
## [1] 139.0000000   0.8936456
## [1] 140.0000000   0.8940079
## [1] 141.0000000   0.8943665
## [1] 142.0000000   0.8947214
## [1] 143.0000000   0.8950727
## [1] 144.0000000   0.8954204
## [1] 145.0000000   0.8957647
## [1] 146.0000000   0.8961055
## [1] 147.0000000   0.8964429
## [1] 148.000000   0.896777
## [1] 149.0000000   0.8971078
## [1] 150.0000000   0.8974354
## [1] 151.0000000   0.8977598
## [1] 152.0000000   0.8980811
## [1] 153.0000