Here’s my (edited) post from back in 2008:
The Stanford Prison Experiment (SPE) by Phil Zimbardo has been for me an example of the astonishing things that we humans are capable of. Though I didn’t realize it was an example of human gullibility. I had not been skeptical about the experiment, which lacks quite a few scientific markers (aside from its ethical problems). During a talk by Barbara Oakley, she was asked to comment about the SPE because it showed the influence the situation and roles could have on human behavior. She responded that there are quite a few questions about this experiment and pointed us to a summary of the critique at Wikipedia (originally reviewed around July 15, 2008). I finally had a chance to review this and am retiring another holy cow now: the experiment is not nearly as thoroughly tested against reality as we are led to believe… At best, it could have been used to develop hypotheses instead of what it seems to be presented as: The truth of situational influences on human behavior.
What’s missing from the experiment that made Zimbardo famous: It cannot be replicated (how convenient); it lacked a control group and a large sample size (only 24 people participated). These are major flaws for a study that is supposedly decisive about human behavior. This is probably why it has never been published in a leading academic journal, unlike a modified follow-up experiment.
In his critique of the SPE, Erich Fromm points out that the main conclusion the researchers draw is actually not supported by their data (despite their attempts to mask that by using vague terminology like “some” and “a few” rather than the actual numbers):
The authors believe it proves that the situation alone can within a few days transform normal people into abject, submissive individuals or into ruthless sadists. It seems to me that the experiment proves, if anything, rather the contrary. If in spite of the whole spirit of this mock prison which, according to the concept of the experiment was meant to be degrading and humiliating (obviously the guards must have caught on to this immediately), two thirds, of the guards did not commit sadistic acts for personal “kicks,” the experiment seems rather to prove that one can not transform people so easily into sadists by providing them with the proper situation.
The difference between behavior and character matters very much in this context. It is one thing to behave according to sadistic rules and another thing to want to be and to enjoy being cruel to people. The failure to make this distinction deprives this experiment of much of its value, as it also marred Milgram’s experiment. [Reference added]
Fromm further questions whether the prisoners had trouble distinguishing reality from experiment because of the situation they were in. He points out that the prisoners were arrested by real Palo Alto police who did not answer any questions regarding the reality of the charges. Being arrested for fictitious charges without being told that this is the beginning of the experiment would create confusion for anyone. How were they supposed to know that they are part of an experiment when the real police is involved? As Fromm puts it
Since it is most unusual for police authorities to lend themselves to such an experimental game, it was very difficult for the prisoners to appreciate the difference between reality and role-playing. The report shows that they did not even know whether their arrest had anything to do with the experiment, and the officers refused to answer their questions about this connection. Would not any average person be confused and enter the experiment with a sense of puzzlement, of having been tricked, and of helplessness?
I suppose, it could be argued that this was necessary for the experiment to work properly but it certainly makes the conclusion about the confusion questionable.
Fromm finally criticizes the lack of external validity: The experiment results are not compared to the reality of prisons, which are – fortunately – mostly far different from the set-up for the SPE. Zimbardo now argues that his SPE predicted Abu Ghraib. However, Fromm compared the findings from the SPE with the Nazi concentration camps, which puts into doubt Zimbardo’s interpretation of what happened at Abu Ghraib. Fromm postulates:
(1) SS guards ruthlessly humiliated the prisoners on the initial transport to the concentration camps but no such sadistic treatment happened on consequent transports. This suggests that the original treatment was a deliberate attempt to “break” the prisoners, i.e., “sadistic rules,” and not a reflection of “individual sadism of the guards.” Guards in Abu Ghraib were also instructed to “break” prisoners to get information from them.
(2) “Differences in the attitude, respectively, of apolitical, middle-class prisoners (mostly Jews) and prisoners with a genuine political conviction or religious conviction or both demonstrate that the values and convictions of prisoners do make a critical difference in the reaction to conditions of the concentration camp that are common to all of them.”
Both suggest that it takes more than the situation to create what was observed in the SPE (and Abu Ghraib). Especially the second point “contradicts the behaviorist thesis Haney et al. tried to prove with their experiment.” Fromm rightfully argues that it would make more sense to analyze real situations such as concentration camps rather than conducting an experiment, which doesn’t even provide one of the key reasons for experiments: a control group to make the results generalizable.
The difference between the mock prisoners and real prisoners is so great that it is virtually impossible to draw analogies from observation of the former. For a prisoner who has been sent to prison for a certain action, the situation is very real; he knows the reasons (whether his punishment it just or not is another problem); he knows his helplessness and the few rights he has, he knows his chances for an earlier release. Whether a man knows that he is to stay in prison (even under the worst conditions) for two weeks or two months or two years or twenty years obviously is a decisive factor that influences his attitude. This factor alone is critical for his hopelessness, demoralization, and sometimes (although exceptionally) for the mobilization of new energies – with benign or malignant aims. Furthermore, a prisoner is not “a prisoner.” Prisoners are individuals and they react individually according to the differences in their respective character structures. But this does not imply that their reaction is only a function of their character and not one of their environment. It is merely naive to assume that it must be either this or that. The complex and challenging problem in each individual — and group — is to find out what the specific interaction is between a given character structure and a given social structure. It is at this point that real investigation begins, and it is only stifled by the assumption that the situation is the one factor which explains human behavior. [My emphasis]
Fromm raises some very valid concerns and his critique largely calls into question the conclusions Zimbardo and others have drawn from the 6-day SPE. While it might have shown that the individual disposition isn’t the only thing that determines how people behave, that the situation also has an influence, it failed to take into account the interplay of personality and situation. But more importantly, it lacks scientific credibility.
A more recent attempt to replicate – at least in part – the Stanford Prison Experiment has also come to different conclusions than Zimbardo.