Blog Post »

More on red and pink…

In a recent blog post, Uri Simonnsohn referred to Andrew Gelman’s attack on our research, which found that women who are at high risk for conception are more likely to wear red or pink than women at low conception risk. We have already replied to Gelman’s critique, here and here, but Simonsohn’s post provides a useful way of integrating our various responses, which, we believe, generally follow his guidelines, as described below:

Right Response #1.  “We decided in advance”

Several of our analytic decisions were made a-priori (e.g., dates for the windows of high and low conception risk, the decision to test for an effect of conception risk on women’s tendency to dress in red and shades of red—i.e., pink—only). We didn’t pre-register these decisions—in part because pre-registration was barely more than a twinkle in our field’s eye at the time we conducted this research.

Other decisions—such as whether to look at red only, pink only, or the two colors combined—were made after looking at the data from our first study. To be clear, our theory (detailed here and here) is about shades of red. But, one could certainly argue about how to best operationalize this variable. After finding the predicted effect only on the two shades of red combined in our first study (“Sample B” in the paper), this became our dependent variable in the direct replication, reported in the same paper (“Sample A”) and all later studies; see below—Right Response 3—and here.

Right Response #2.  “We didn’t decide in advance, but the results are robust”

Several analytic decisions were inherently more ambiguous, such as whether to exclude from analyses women who are currently menstruating or pre-menstrual.  Given this ambiguity, in the main text we reported results including these women, and in an endnote reported results excluding them (results held).

Right Response 3. “We didn’t decide in advance, and the results are not robust. So we ran a direct replication.”

Given the relatively small samples of our first two studies (Ns = 100 and 24), we later conducted a second direct replication. In this new study, we failed to replicate the previously observed effect. But, the conditions of our failure led us to develop a novel hypothesis about a moderator variable (current weather; see Tracy & Beall, 2014.) We examined natural variation in this moderator variable within that dataset, and found support for our account, then conducted a new study in which we quasi-manipulated the moderator variable. The predicted interaction emerged, supporting our account. These findings, including the failure to replicate, were reported in Tracy & Beall, 2014.

In other words, we followed normative standards for how science should proceed. We initially found consistent and strong support for our hypothesis across two studies, but then failed to replicate in a third. Rather than file that failed replication away in a drawer, we developed a hypothesis that might explain it, found support for that hypothesis in the data we’d already collected, and then ran a new experiment explicitly designed to test it. In our view, the data we’ve collected thus far support our two hypotheses (regarding the main effect and the moderator). But, this certainly doesn’t mean that the case is closed. Just as our first failure to replicate prompted us to refine our theory, we fully expect, and hope, that future research on this topic will lead us or others to refine the theory even further.

Right Response 4. We’d like to add a fourth right response to Simonsohn’s three: “We conducted a meta-analysis across all the data we collected, and found the predicted result.”  

Combining across all the data we collected on this issue (N = 633, which includes the failure to replicate), the main effect holds, but becomes substantially weaker than that reported in our original paper (across all data, about 16% of women at high conception risk reported wearing red/pink, compared to about 10% of women at low risk, suggesting a 67% increase in the tendency to dress in red or pink at times of high risk; odds ratio = 1.67, p = .032). Now that we are aware of an important moderator variable, which varied (both naturally and systematically) across the four studies included in this analysis, it is not surprising that this effect is smaller than that reported in our original paper, in which the moderator variable (purely by chance) remained stable.

Examining this full dataset, we also found that the effect holds across various different analytic decisions regarding exclusions and windows of conception risk. For more details, or access to the full dataset, please contact Alec Beall at

Make a Comment »