Power and Effect Size Workshop
Here, you will find the slides and R code used for my workshop on effect size and power analysis.
Download R code for simulations + all figures that I produced for the slides (not well commented; use at your own peril)
Below is a set of readings that I find helpful in thinking about power and effect size. Some of these readings concern topics covered in the workshop, whereas others provide detail on topics that I mentioned in passing.
Paul Meehl—“Theoretical Risks and Tabular Asterisks: Sir Karl, Sir Ronald, and the Slow Progress of Soft Psychology.” Journal of Consulting and Clinical Psychology, 1978. This is my favorite academic article on any subject. Meehl’s thesis is that null hypothesis significance testing has held hindered progress in some fields of psychology by excusing researchers from conducting “risky” tests of their theories. A risky test is a test that a false theory is likely to fail. In contrast, Meehl argues that the null hypothesis significance test is an extremely easy threshold to cross for any large, well-powered psychological study. It is lively, entertaining, lucid, and correct. It is also available for free on Paul Meehl’s website, which has been maintained since his death by the University of Minnesota.
Meehl (1990) Why Summaries of Research on Psychological Theories are Often Uninterpretable. http://www.psych.umn.edu/people/meehlp/WebNEW/PUBLICATIONS/144WhySummaries.pdf
Another Meehl tour de force, with a thoughtful section on the suboptimal way in which pilot studies are incorporated into the larger body of empirical knowledge.
John Ioannidis—“Why Most Published Research Findings Are False.” PLoS Medicine, 2005. This provocative article argues that in many fields of research, the claims of most of the empirical articles published are false. This claim has generated controversy, but the important thing in this article is not the estimate of the proportion of studies that make false claims. Rather, Ioannidis’ synthesis of factors that affect the proportion of false studies in a given field is what deserves attention. Studies in fields where typical effect sizes are small or zero, where researchers are strongly incentivized to publish and publishability hinges on statistical significance, and where studies are chronically underpowered are most likely to report false claims.
Jacob Cohen—“The Earth is Round (p < 0.05).” American Psychologist, 1994. An entertaining and clear tour of the major misinterpretations and abuses of significance tests, with an emphasis on the ways in which ignoring power affects progress on psychology. Cohen did more than anyone else to raise awareness of the importance of power in psychological research, and the piece amply rewards reading.
Simmons, Nelson, & Simonsohn (2013). Life After p-Hacking. http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2205186. This short powerpoint presentation concisely presents a framework for thinking about power requirements and some basic effect sizes. It's a short read and more than worth the investment.
Open Science Collaboration (2015). Estimating the Reproducibility of Psychological Science. Science.
100 published studies, replicated as nearly exactly as possible. A huge undertaking and a great achievement.
The relationship of r and d
McGrath & Meyer (2006). When effect sizes disagree: the case of r and d. Psychological Methods.
R^2 analogues for mixed-effects models
Nakagawa & Schielzeth (2013). A general and simple method for obtaining R^2 from generalized linear mixed-effects models. Methods in Ecology and Evolution.
This article provides some readable background on attempts to construct an analogue of R^2 useful for mixed-effects models. It also proposes and explains a novel pair of options that are popular among ecologists. Their proposal only applies to models with a random intercept. (That is, no random slopes allowed.)
Johnson (2014). Extension of Nakagawa & Schielzeth R^2_GLMM to random slopes models. Methods in Ecology and Evolution.
Does exactly what the title says, providing an R^2 analogue for mixed-effects models with random slopes.
Power calculations for mixed-effects models and Generalized Estimating Equations (GEE)
Liu, G., & Liang, K. Y. (1997). Sample size calculations for studies with correlated observations. Biometrics, 937-947.
This is the standard (but technical) reference on analytic power analysis for longitudinal data. The methods here are implemented in the R package longpower.
"Observed" or "Post Hoc" Power
Hoenig, J. M., & Heisey, D. M. (2012). The abuse of power. The American Statistician.
This paper dismantles the concept of "observed" or "post hoc" power, in which power is calculated using the estimated effect size from the study at issue. I have to warn you, though--when they say "effect size," they mean unstandardized effect size, and if you assume they are talking about standardized effect size, then their main argument in section 2.2 (which isn't about observed power) doesn't make sense. The main point of this paper is accessibly explained in this blog post: http://daniellakens.blogspot.com/2014/12/observed-power-and-what-to-do-if-your.html. I don't necessarily recommend the procedure at the end of the blog post--a simple confidence interval is preferable to a "sensitivity" analysis of this sort (as Hoenig & Heisey argue).
Effect Size and Power Analysis for Mediational Studies
I am usually unenthusiastic about mediation models in psychology because little evidence is typically presented that the actual causal structure matches the proposed mediation model. Such evidence typically cannot come from the data themselves because the mediation model is just-identified, meaning that there as many free parameters as estimated covariances, so any simple mediation model can accommodate any observed set of covariances. With that caveat stated, here are some interesting readings:
Preacher, K. J., & Kelley, K. (2011). Effect size measures for mediation models: quantitative strategies for communicating indirect effects. Psychological methods.
This article reviews effect size estimates for mediation and proposes two new measurements of effect size.
Weng & Fan (2015). Monotonicity of Effect Sizes: Questioning Kappa-Squared as Mediation
Effect Size Measure. Psychological Methods.
Weng & Fan claim that one of Preacher & Kelley's (2011) propsed effect size measurements for mediation is problematic. I haven't read these papers closely enough to know how to resolve the apparent dispute, but I'm happy to do so if someone wants to know more about it.
Zhang (2014). Monte Carlo based statistical power analysis for mediation models: methods and software. Behavior Research Methods.
This paper gives a clear example of a Monte Carlo (that is, simulation-based) power analysis, along with an R package you can use to conduct your own power analysis of a mediation study.