btw, doesn’t look like I was linking the Data Colada updates
so here they are:
The post on the trial:
8 May 2024
So then Gino’s lawyers have to prove that we knew we were lying or that we were reckless. That poses something of a challenge because, amongst other things, we believe every single thing we wrote on the topic. Furthermore, the hundreds of hours of painstaking analysis that went into those blog posts doesn’t scream out “reckless”. So at this point, you’re probably wondering what the Gino attorneys are arguing. We’ll be honest here and say that, although we were at the hearing and listened to every word, we are not really sure. There was a moment when Gino’s lawyer tried to make a point by taking one of our statements out of context and simply misquoting another. But our lawyer took care of that. They make no legitimate claim that we knew that we were lying. And that makes sense. Because we weren’t.
Most recent post on the altered data:
from 9 July:
As you may know, Harvard professor Francesca Gino is suing us for defamation after (1) we alerted Harvard to evidence of fraud in four studies that she co-authored, (2) Harvard investigated and placed her on administrative leave, and (3) we summarized the evidence in four blog posts.
As part of their investigation, Harvard wrote a 1,288-page report describing what they found. Because of the lawsuit, that report was made public: .pdf. And because it was made public, we now know what the investigators say was in the “original” dataset for one of the four studies: Study 3A of Gino, Kouchaki, and Casciaro (2020) [1]. By simply comparing the Original and Posted versions of the dataset, we can see exactly how the data were altered to produce the published result. In this post, we will show that:
- We correctly deduced how the data were altered.
- Gino’s prevailing explanation for the alterations is extremely implausible.
I am jumping over stuff in the middle:
Gino’s hypothesis says that Participant 4 said that the event was “authentic” but truly rated the event as “inauthentic”. Our hypothesis, confirmed by Harvard, says that Participant 4 said that the event was “authentic” and truly rated the event as “authentic”.
We can look at this more completely. We had asked some workers to rate the positivity/negativity of the words that participants wrote about the networking event. This allows us to see the relationship between those word ratings and the moral impurity ratings. If these word ratings are valid, then participants who gave higher ratings of moral impurity should have written words that were rated to be more negative. And, indeed, in the Original dataset, this is what you see:
When people gave negative ratings they said negative things. When people gave positive ratings they said positive things. Of course [10].
Now if Gino’s hypothesis were true – if the Posted data are real and the Original data were altered – then we should see a more sensible relationship between the words and the ratings within the Posted dataset than within the Original dataset. We can test this by looking at the words/ratings relationship in the 104 altered observations. Is that relationship more sensible in the Posted data than in the Original data?:
No. Opposite. In the Original data, the relationship is completely sensible: more negative ratings = less positive words. In the Posted data, it is . . . backwards: more negative ratings = more positive words.To believe that the Original data are fake and the Posted data are real, you’d have to believe that the sensible data are fake and the backwards data are real. That is a difficult thing to believe.
An easier thing to believe is that when the fraudster changed the ratings, she forgot to change the words, and so, when she altered them, the relationship between the words and the ratings got all cattywampus [11].
Conclusions
We were right about how the data were altered, Gino’s prevailing explanation for the alterations does not make sense, and yet we are the defendants in this case.


