Serious Questions about the Stanford Prison Experiment

The Stanford Prison Experiment (SPE) by Phil Zimbardo has been for me an example of the astonishing things that we humans are capable of. I guess as an example of human gullibility, I had not been skeptical about the experiment, which lacks quite a few scientific markers (aside from its ethical problems). During a talk by Barbara Oakley, she was asked to comment about the SPE because it showed the influence the situation and roles could have on human behavior. She responded that there are quite a few questions about this experiment and pointed us to a summary of the critique at Wikipedia. I finally had a chance to review this and am retiring another holy cow now: the experiment is, well, crap not nearly as thoroughly tested against reality as we are led to believe… (Thanks to a discussion in the comments, I now understand that Zimbardo does deserve credit for pointing to the importance of situational influences. I still think, though, that he, at best, could use SPE for the development of hypotheses, not as support for a theory, as he seems to be doing. ).

What’s missing from the experiment that made Zimbardo famous: It cannot be replicated (how convenient); it lacked a control group and a large sample size (only 24 people participated). These are major flaws for a study that is supposedly decisive about human behavior. This is probably why it has never been published in a leading academic journal, unlike a modified follow-up experiment.

In his critique of the SPE, Erich Fromm points out that the main conclusion the researchers draw is actually not supported by their data (despite their attempts to mask that by using vague terminology like “some” and “a few” rather than the actual numbers):

The authors believe it proves that the situation alone can within a few days transform normal people into abject, submissive individuals or into ruthless sadists. It seems to me that the experiment proves, if anything, rather the contrary. If in spite of the whole spirit of this mock prison which, according to the concept of the experiment was meant to be degrading and humiliating (obviously the guards must have caught on to this immediately), two thirds, of the guards did not commit sadistic acts for personal “kicks,” the experiment seems rather to prove that one can not transform people so easily into sadists by providing them with the proper situation.

The difference between behavior and character matters very much in this context. It is one thing to behave according to sadistic rules and another thing to want to be and to enjoy being cruel to people. The failure to make this distinction deprives this experiment of much of its value, as it also marred Milgram’s experiment. [Reference added]

Fromm further questions whether the prisoners had trouble distinguishing reality from experiment because of the situation they were in. He points out that the prisoners were arrested by real Palo Alto police who did not answer any questions regarding the reality of the charges. Being arrested for fictitious charges without being told that this is the beginning of the experiment would create confusion for anyone. How were they supposed to know that they are part of an experiment when the real police is involved? As Fromm puts it

Since it is most unusual for police authorities to lend themselves to such an experimental game, it was very difficult for the prisoners to appreciate the difference between reality and role-playing. The report shows that they did not even know whether their arrest had anything to do with the experiment, and the officers refused to answer their questions about this connection. Would not any average person be confused and enter the experiment with a sense of puzzlement, of having been tricked, and of helplessness?

I suppose, it could be argued that this was necessary for the experiment to work properly but it certainly makes the conclusion about the confusion questionable.

Fromm finally criticizes the lack of external validity: The experiment results are not compared to the reality of prisons, which are – fortunately – mostly far different from the set-up for the SPE. Zimbardo now argues that his SPE predicted Abu Ghraib. However, Fromm compared the findings from the SPE with the Nazi concentration camps, which puts into doubt Zimbardo’s interpretation of what happened at Abu Ghraib. Fromm postulates:
(1) SS guards ruthlessly humiliated the prisoners on the initial transport to the concentration camps but no such sadistic treatment happened on consequent transports. This suggests that the original treatment was a deliberate attempt to “break” the prisoners, i.e., “sadistic rules,” and not a reflection of “individual sadism of the guards.” Guards in Abu Ghraib were also instructed to “break” prisoners to get information from them.
(2) “Differences in the attitude, respectively, of apolitical, middle-class prisoners (mostly Jews) and prisoners with a genuine political conviction or religious conviction or both demonstrate that the values and convictions of prisoners do make a critical difference in the reaction to conditions of the concentration camp that are common to all of them.”

Both suggest that it takes more than the situation to create what was observed in the SPE (and Abu Ghraib). Especially the second point “contradicts the behaviorist thesis Haney et al. tried to prove with their experiment.” Fromm rightfully argues that it would make more sense to analyze real situations such as concentration camps rather than conducting an experiment, which doesn’t even provide one of the key reasons for experiments: a control group to make the results generalizable.

Fromm’s conclusion:

The difference between the mock prisoners and real prisoners is so great that it is virtually impossible to draw analogies from observation of the former. For a prisoner who has been sent to prison for a certain action, the situation is very real; he knows the reasons (whether his punishment it just or not is another problem); he knows his helplessness and the few rights he has, he knows his chances for an earlier release. Whether a man knows that he is to stay in prison (even under the worst conditions) for two weeks or two months or two years or twenty years obviously is a decisive factor that influences his attitude. This factor alone is critical for his hopelessness, demoralization, and sometimes (although exceptionally) for the mobilization of new energies – with benign or malignant aims. Furthermore, a prisoner is not “a prisoner.” Prisoners are individuals and they react individually according to the differences in their respective character structures. But this does not imply that their reaction is only a function of their character and not one of their environment. It is merely naive to assume that it must be either this or that. The complex and challenging problem in each individual — and group — is to find out what the specific interaction is between a given character structure and a given social structure. It is at this point that real investigation begins, and it is only stifled by the assumption that the situation is the one factor which explains human behavior. [My emphasis]

Fromm raises some very valid concerns and his critique largely calls into question the conclusions Zimbardo and others have drawn from the 6-day SPE. While it might have shown that the individual disposition isn’t the only thing that determines how people behave, that the situation also has an influence, it failed to take into account the interplay of personality and situation. But more importantly, it lacks scientific credibility.

A recent attempt to replicate – at least in part – the Stanford Prison Experiment has come to different conclusions than Zimbardo. I’ll have to read through the papers first, though, so more on this in another post.

Serious Questions about the Stanford Prison Experiment — 23 Comments

  1. This was a very well done experiment, very interesting from a psychological view.
    Very interesting view into human behaviour in authoritarian power.
    We need to do more studies like these, but in this experiment there were no ethical invigilators involved. When the BBC tried to reproduce the experiment in 2003 a team of five ethical invigilators pulled the experiment on the 5th day because it was getting violent. They didn’t have that at the Stanford Prison experiment, if they did have it would have been pulled long before it led to the prisoners suffering mental breakdowns.?

  3. Rachel,
    stick with your original word choice. The SPE is crap. It has no external validity. It is valuable only in its ability to demonstrate that people act differently in different settings. It has the 20/20 hindsight ability to predict the behavior of two dozen Stanford students in 1971. It did not have the ability then or now to predict the behavior of real reservist soldiers being asked to gaurd real Iraqi prisoners in Abu Ghraib in 2003/2004 as Philip Zimbardo now asserts. Nor can it predict ANY behavior of ANY guard or ANY prisoner in the future. It relevence is elevated by Philip Zimbardo’s personal association with the higherarchy of the APA itself and not by its actual value to the scientific community.
    I have served in a leadership role within a detention facility in Iraq. To even suggest that the behavior of two dozen Stanford students in 1971 could predict the behavior of the soldiers who honorably served with me is preposterous. No guard behaved poorly on my watch, no prisoner did either.
    Zimbardo’s “experiment” is more a reflection of lack of structure, percieved role, and scientific hubris than anything else. You might wish to check out Spencer, C. D. (1978) Two types of role playing: Threats to internal and external validity. American Psychologist, Vol 33(3), Mar 1978, 265-268.
    Keep questioning percieve authority and the Status Quo Rachel even Philip Zimbardo would approve of that.

  4. what da f*** and watched both versions of the film …..WELL THIS JUST shows how SICK HUMAN beings really are…… that includes PHIL ZAMBARDO.. i mean did the people really need to conduct this experiment to know a human being could or would react like this?? The way you talk on your site is like it was all ok nothing to worry about and it was normal. Rubbish, this is entertainment….in a film yeah but based on a true story..well….. fascinating. I may not be shocked anymore by the way the human race behaves but I surely am still fascinated and disgusted. Ok I should get over it but Im entitled to voice my opinion and this is it I BELIEVE ITS WRONG, NO ONE SHOULD BE PUT TO THE TEST …….ESPECIALLY WHEN WE KNOW WHAT THE ANSWER IS!!! I MEAN WHY RISK IT???? WHAT WOULD YOU DO …AND THERE IS YOUR ANSWER!!! Im 24 years of age and I dont know much and I have one hell of alot to learn one thing I have learned is that politicians and professors (MAJORITY of them) should be lined up and SHOT and not a day goes by where they prove me wrong…but hey Im just another guinea pig getting PI**e* of over stuff thats going on in this world BUT NONE OF US CAN REALLY DO ANYTHING ABOUT IT…(PLEASE NOTE I HAVE NOT YET READ ANYONES COMMENTS ON THIS OR ANY OTHER SITE but I FEEL STRONGLY ABOUT THIS SUBJECT so am voicing some of my thoughts on ANY SITE relating to this matter. thanks s

    • Yupp, humans are strange… They go around screaming at people they don’t even know and threaten to shoot people (all caps is considered screaming on the web). Overall, I am not quite sure what you’re upset about: That the experiments showed how people cruel people can be or that people decided to do these kinds of experiments or that I can write about the experiments without getting all upset? I presume it’s the conducting of the experiments. Remember, though, that you’re speaking from hindsight: We now know that this is what happens because of Zimbardo’s experiments (although, as Fromm pointed out, we did have a real world experiment to analyze). Also, the experiment was halted earlier than planned because of the implications found. So, it’s not like the experimenters went on pressing people to behave in cruel ways. I don’t know what films you’re referencing. You mean the attempts to replicate the SEP?

  5. Hi there,

    I agree with your comments on Zimbardo. I find myself that he resembles one of those American televangelists. A self made, money making one man force!

    I am a psychology undergraduate in the UK and am fortunate to have one of the most brilliant modern psychologists Dr Alex Haslam at the uni. He conducted the BBC Prison study which is the modern take on the SPE. Alex spoke to some of us about his experiment but I have also read his study in great detail myself. There are some key differences in the way things were conducted and via manipulations. Mainly, it was more scientific and ethical. Zimbardo is very negative about this study but I think maybe because it contradicts his and does not have the flaws that his had. Although in typical Zimbardo style, one can find an interview with Zimbardo praising the study and another saying the opposite!! The guy can’t make up his own mind so how can anyone else respect him. I also dislike him for trying to support the criminal behaviour of the guards at Abu Ghraib, thank goodness the jury found them guilty regardless. I hate to think that they could have got away with that abuse just because of him and his old fashioned and unsupported views.

    Having studied other “classic psychology studies” I would say objectively that Zimbardo’s SPE study is the most flawed of them all.

    I found comments on this website insightful and stimulating and if you would like anymore info or a copy of the Haslam & Reicher study I can email it.



  7. Thanks, Philip! More to add to my reading list…

    I think doubt and skepticism are non-overlapping, fundamentally different concepts that often overlap, especially in my mind-kluge. Just kidding! Skepticism is usually a good thing…

  8. Hi Rachel

    Thanks for responding. Although Zimbardo does seem to like the limelight, I’d rate him rather low in the pantheon of academic douche-bags. The highest honor must go to Dr. James Watson, IMHO.

    Feel free to doubt me as much as possible. You’re right to be skeptical! As far as pointers go, I’d recommend some social psych edited volumes to get started:

    The Sage handbook of social psychology, edited by Michael A. Hogg and Joel Cooper. London: Sage, 2003

    The handbook of social psychology, edited by Daniel Gilbert, Susan Fiske, Gardner Lindzey. New York: McGraw-Hill, 1998

    Another interesting case is the Robbers’ Cave Experiment by Muzafer Sherif. This experiment actually predates the SPE, and seems to point in some similar directions. That is, roles are dependent upon context. Check out:

    Sherif, M., White, B. J., & Harvey, O.J. (1955): Status in experimentally produced groups. American Journal of Sociology. 60: 370 – 379.


  9. “Perhaps this is more rightly directed at Zimbardo’s self-aggrandizing style, rather than the implications of his research.” That’s a very good point, Philip! Maybe that’s part of the reason why the critique of his experiment is so forceful… It seems like he built his career on sand but he was successful in doing so (hmm, some jealousy here? Sometimes I wish I had the galls to do that…).

    What “wings of social psychology that have taken up the questions that the SPE provoked”? I am not doubting you, I’d just like some pointers on where to look for further research… Thanks!

  10. Hi Rachel

    Thanks for responding. As the risk of beating what seems to be becoming a dead horse, I’m still unclear about what you mean when you write: “I think there are fundamental differences between exploratory, descriptive, and causal research – even if they can overlap.” If two categories are fundamentally different, then it strikes me that there should not be any overlap. If there is overlap, then perhaps the differences are trivial or non-existent.

    You write: “Again, I think that the SPE should have been a first step in theory building, rather than the final say…He did raise a lot of issues, which should have been investigated in further research but weren’t.” I never claimed that the SPE was some sort of definitive statement about human interaction. And there are whole wings of social psychology that have taken up the questions that the SPE provoked, even if they did not replicate its design.

    Outside of debates over the scientific status of the SPE, some people (like Oakley) seem to have a strange hostility to the SPE. Perhaps this is more rightly directed at Zimbardo’s self-aggrandizing style, rather than the implications of his research.


  11. I think there are fundamental differences between exploratory, descriptive, and causal research – even if they can overlap. Knowledge base expansion is related to exploratory research. Descriptive and causal research happens at the theory building phase. Of course, putting this on a step-by-step process is a big simplification, as is the description of the scientific method in general. However, sometimes simplification can lead to clarification: Even though the steps of the scientific method in reality often overlap and the process is more recursive than it looks like on paper, having a clear idea of what goes where helps prevent making an experiment like SPE look like more than it actually is. (I think this is similar to Kuhn’s idea of “normal science.”) In order to increase our confidence in scientific findings, we need to know how certain we can be about these findings.

    I was not trying to be completely historically accurate with the example on the “Origin of Species” example, including all the twists and turns that happens when science develops. My point was that Darwin’s book expanded our knowledge but did not test his propositions & hypotheses (or in the terminology I was using: it was knowledge base expansion but not theory building).

    Again, I think that the SPE should have been a first step in theory building, rather than the final say. Zimbardo did not confirm anything with the SPE. He did raise a lot of issues, which should have been investigated in further research but weren’t.

  12. Hi Rachel

    Thanks for responding. However, your reply has left me with more questions. First let me rephrase what you are saying to be sure I understand it. You argue that science is a process that has two key points; theory building (or TB) and knowledge base expansion (KE). Temporally, you argue that KE happens first, followed by TB, and you characterize the SPE at the KE phase.

    This appears at first glance to be a reasonable position, however it has many flaws, as phase- or stage-based theories of development share. There are demarcation problems, for example. That is, the KE phase in not “pre-theoretical,” but is in fact deeply theoretical. So I’m not sure how it is different than the TB phase. If the phases share common elements, and differ only in temporal position, I’m not sure how you are warranted to say there is a difference.

    I think the “Origin of the Species” example is much more complex than you give it credit for. You write: “However, only rigorous testing of his ideas, including that done by geneticists, has developed Darwin’s ideas into a full-blown scientific theory.” This is not how the story happened. In fact, Darwin was wrong about many elements of evolution, such as his idea of pangenesis, and it was not until the Modern Synthesis that what we today call “natural selection” was understood in the light of population genetics. To be clear, I’m not claiming this makes evolution “wrong” in any simple-minded sense. It is our best theory of the proliferation of natural forms to date. However, our understanding of evolution did not proceed in the clean, purely rational way that you make it out. Again, I would encourage you to read Kuhn and post-Kunhian critiques of this positivist bias in explaining how science works.

    I think it is also important to separate Zimbardo’s ambitions for attention and the implications of the SPE. I don’t disagree that Z has built his reputation from pushing (perhaps too hard in your eyes) one experiment conducted in 1971.


  13. Hmm, you’re the third person who finds a position of mine contradictory… Maybe something I need to look into: Either I have trouble expressing myself clearly or I hold contradictory positions (or both ;-)…

    In any rate, let me try to clarify my position for us – thank you for giving me this opportunity, Philip! Btw, this discussion has prompted me to reword my original post to take out the word “crap…”

    I am not claiming that (1) negates (2). However, I would change (2) to read: “The SPE has might have helped us to understand social processes with more clarity.” The key differentiator is theory building vs. knowledge base expansion. What do I mean by that? Science happens on a continuum, so these are two different points in a process. Theory building happens when we already have hypotheses set up, we are testing those within a context of an existing theory (we’d use a classical/scientific experimental design for that). That results from (1) but not from (2). Knowledge base expansion happens before we develop theories, we might have some ideas about what could or could not happen but we might take the approach “let’s do XYZ and see what happens.” We can then use those results to develop hypotheses and, eventually, theories. We could probably call this exploratory research. That results from (2), though it can also result from (1). I would categorize SPE here: We have learned something (situational factors can have a tremendous impact on behavior). However, this knowledge – in order to turn it into a valid theory – needs to be put to the test again and again. Only then can we feel confident that we’re actually seeing something that is there, not just something that “makes intuitive sense” but isn’t really there (many things that are “common knowledge” turn out to be false under scrutiny…). So, theory building is a much more rigorous process but it can start with knowledge base expansion. Or, as those authors from the article I quoted put it: It “suggest[s] a demonstration [knowledge expansion] rather than a conclusive test [theory building] of the validity of this model.” Again, demonstration might be a step in a process that eventually includes a conclusive test (in fact, it might make sense to do a “proof of concept” first because if the idea falls apart at that stage, there is no need to spend time and money on more rigorous testing). (Btw, the article on post-conflict resolution doesn’t have anything to do with SPE; I chose to quote from it because the authors made the same distinction I was trying to make. That sentence is the only connection with our discussion…)

    Let’s take a different example: Darwin’s theory of natural selection. Darwin’s book “The Origin of Species” outlined a possible way that evolution might have happened. It added to our knowledge base. However, only rigorous testing of his ideas, including that done by geneticists, has developed Darwin’s ideas into a full-blown scientific theory. Obviously, both steps are necessary in science. We cannot develop new theories if we don’t build our knowledge. And, I suppose, theory building also expands our knowledge, which might add to the confusion, so let me add another dimension: Certainty. While we can never be absolutely certain that evolution happens via natural selection (and the other mechanism, genetic drift), the more we have built the theory (which includes lots of valid experiments like (1)), the more certain we can be that our theory explains what is actually going on in reality. The knowledge we’ve gained from SPE is not as certain because it has not been replicate (verified) by others. Maybe a careful analysis of Abu Ghraib could do that, if we approach it as a “natural” experiment (and not assume from the get-go that it proves SPE).

    Now, as far as I know, the findings from SPE have not been tested further (though, as you pointed out, The Experiment, might be a step in that direction). And that is where the main point of my critique comes in: Zimbardo acts as if he has done (1) and (2). While that is certainly possible with the right experiment, SPE does not conform to (1).

    Good points about the Prescott article (if we can even call it an article…)! Although that wasn’t the only place that I have read that Zimbardo might have coached the guards… See, this is another problem with SPE: We do not have clear information about what actually happened. There is no peer-reviewed article that describes, in detail, how the experiment was set up, including what type of coaching went on, if any.

    I hope this long comment actually clarifies things, not just muddles the water further. Again, thank you for giving me this opportunity! It is difficult to think things through by oneself but that’s another story…

  14. Hi Rachel

    Thanks for responding. I agree with you – we are talking about two different things:

    1. The status of SPE as an experiment (in the classic sense). Here we are in agreement. The SPE does not fit the definition of classic experimental design. Therefore it is invalid.
    2. The knowledge gained from the SPE. The SPE has helped us to understand social processes with more clarity.

    However there seems to be a contradiction in your position. One the one hand, you seem to be claiming that (1) negates (2). On the other, you affirm (2) despite (1). I draw this conclusion from your quote: “Have we learned something from Zimbardo’s SPE? Of course we have. But this does not mean that it is valid as a scientific experiment that is used in the course of theory building. Of course, experiments can be used to explore areas of knowledge, to expand our knowledge base. But you cannot use the results of one (badly designed) experiment to support your theory, as Zimbardo does.” You seem to be saying that our knowledge base has increased despite the problems inherent in the SPE.

    If we operate from the premise that not all scientific research fits the classical experimental design, I’m still not clear why you are so opposed to the SPE. All experiments have flaws, biases or shortcomings. That the SPE has flaws in not prima facie evidence for its negation.

    I read the Prescott article, and it feels more like sour grapes than anything else. To write an editorial like that over 30 years after the fact seems a bit petty. I’ll read the post-conflict resolution article and give some comments later.


  15. Hi, again, Philip,

    A couple of things that I’d like to add to my comment above:
    * From an article I read on my way home: “Given the limitations of a cross-sectional study with a small sample size, the results from this survey suggest a demonstration rather than a conclusive test of the validity of this model.” Although this was obviously referring to another study, I think this summarizes well the distinction I was trying to make between using the results from an experiment to test a theory or to build general knowledge.
    * I listened to a part of a NPR interview with Neal Katyal, the lead counsel in Bin Laden’s driver’s case. I realized then one of the main problems with explaining what happened in Abu Ghraib only by the situation: It lets the Bush administration off the hook! If I recall correctly Zimbardo explained that the situation in Abu Ghraib developed partially because of a leadership vacuum, i.e., there was no one of authority around during the night. I think this avoids looking at the influence the leadership did have. We know now that a lot of the “interrogation” practices were encouraged.

  16. Within science, an experiment is valid if it meets certain criteria, including large enough sample, control group, and the ability to replicate the findings. SPE did not meet those criteria therefore it is not a valid experiment from a scientific standpoint, in particular, it should not be used to make predictions.

    I think we’re talking about two different things. Your references to “epistemology” made me realize this. Have we learned something from Zimbardo’s SPE? Of course we have. But this does not mean that it is valid as a scientific experiment that is used in the course of theory building. Of course, experiments can be used to explore areas of knowledge, to expand our knowledge base. But you cannot use the results of one (badly designed) experiment to support your theory, as Zimbardo does.

    You assert that “the SPE was able to predict Abu Ghraib.” Do you have any references for that? I know I said something similar in my post but I am now realizing that this is probably an incorrect statement. This is different from saying after the fact that Abu Ghraib might be consistent with some of what was found in SPE. If you look at Fromm’s critique, you’ll see that SPE contradicts what happened in concentration camps. Also, if you look at this article, you’ll see that primary commonalty between SPE and Abu Ghraib was that the guards were coached to apply this cruelty, that is where the similarity comes from. As I wrote in my original post, “it takes more than the situation to create what was observed in the SPE (and Abu Ghraib).”

    (I am not sure where you noticed that my reply was cut off… Based on your reply, it seems that you got all of it).

  17. Hi Rachel

    Thanks for responding. Some of your answers were cut off, so I’m not sure if I am getting all your points. I think you might be attaching too much importance to the word “experiment” in SPE. I agree with you that the SPE does not have the hallmarks of a classic experiment. However, you seem to be claiming that the results of the SPE are still invalid, even if it does not fit the classic description of an experiment. So the question “is the SPE really an experiment?” has both a yes and a no answer. No, it is not the classic design, but yes, it replicated social life in an artificial setting. You seem to be insisting that the SPE measure up to something that it is not.

    In your replies, you keep stressing the replication issue. But you point out at least one attempt to replicate the SPE. Part of the complexity in answering this question is that it is now very difficult to replicate the SPE, given informed consent rules. But as I argued earlier, this is an ethical and not an epistemological point.

    You write: “If a replication of the original experiment shows different results, the original experiment might not have measures what we thought we were measuring.” This is not always the case – in fact, very often even well designed experiments reveal discrepancies in data when replicated. These discrepancies are sometimes ignored as “anomalies,” pace Kuhn, Structure of Scientific Revolutions. Perhaps you see where I’m going…

    Scientists will often use what are called “natural experiments.” These are events that allow the researcher to compare before/after states of affairs, or model some type of phenomenon. I think Abu Ghraib represents such as case vis-a-vis the SPE. If you disagree with that last statement, then you need to explain how the SPE was able to predict Abu Ghraib despite its apparent validity problems. In other words, if the SPE is as compromised as you claim it is, why does Abu Ghraib appear to be so similar? Just coincidence?


  18. Yeah, yeah, I think the word “crap” wasn’t well chosen but, hey, sometimes I try to be provocative ;-). It probably was also unnecessarily disrespectful to Zimbardo because I do think that we cannot ignore the impact of situation on people’s action. It’s just not all that, which is what Zimbardo seems to suggest.

    Thanks for your detailed comment, Philip! Let me see if I can respond to your points…

    “First, the fact that the SPE can be critiqued at all is evidence for its significance. As Karl Popper argued, a statement is scientific if it can be falsified.” The SPE is not a hypothesis. It was an experiment, which cannot be falsified. It is difficult to know what hypothesis Phil Zimbardo

    “Does an experiment’s ethical status automatically invalidate its validity?” No, it does not. However, it makes it close to impossible to replicate, which makes it difficult to falsify the assertions made by Zimbardo.

    Lack of suitable controls: This, again, goes to the fact that this experiment cannot be replicated. If indeed this experiment was used for “knowledge production,” it should have been a first step. As far as I know, there were no follow-up studies that did incorporate control groups (please correct me if I am wrong here!)

    Ecological validity: Maybe Zimbardo was trying to a “total institution.” Validity, though, addresses what can be generalized. The critique here is because Zimbardo’s “prison” was so different from real prisons, his findings cannot be generalized to those prisons.

    “Zimbardo did select his subjects – young, white, male, Stanford students – so he cannot claim that his results apply to all people.” But he does: see his testimony on Abu Ghraib and his recent book… Again, your defense misses the point of validity: no matter what Zimbardo was trying to show, his findings cannot be generalized because of these validity issues.

    “Many studies have small sample sizes, so this is relatively unimportant.” Yes, and therefore they are also being critiqued. If your sample size is too small, your “results will tend to be too imprecise to be of much use.”. This is because “the larger the sample size N, the smaller sampling error tends to be.” You might then find

    How do different results from a replication validate the original results? Validity is addressing the question: Are we measuring what we think we are measuring?. If a replication of the original experiment shows different results, the original experiment might not have measured what we thought we were measuring. That is why replication of experiments is so important: Findings can be spurious.

    To summarize: Because of the (mostly) non-replicability of the experiment, Zimbardo’s findings cannot be falsified. There are also serious issues with validity due to sample size, lack of control group, and experiment set-up.

  19. Hi Rachel

    I was at the Oakley talk – I was actually the person who asked this question. I think your conclusion that the SPE is “crap” is misguided. First, the fact that the SPE can be critiqued at all is evidence for its significance. As Karl Popper argued, a statement is scientific if it can be falsified. Unlike intelligent design arguments, for example, which posit a divine creator as cause (which can not be tested at the end of the day), the SPE gives us structural (and testable) answers about why individuals behave they way they do in certain situations. While aspects of the SPE may be distasteful to us, or it may challenge our previously held conceptions, that does not equate to shoddy research.

    Lets take a closer look at the critiques on wikipedia (apologies for the long answers but these are complex issues):
    · The SPE was unethical. This is can be argued given today’s standards of informed consent in research. However, this critique opens up more difficult questions; namely, does an experiment’s ethical status automatically invalidate its validity? For example, NASA scientists studying the work of Werner Von Braun, a Nazi researcher, developed much of our current knowledge about space flight. Does Von Braun’s membership in the Nazi party invalidate what we know about telemetry? I think the answer is no. To be clear, I’m not arguing an “ends justify the means” point; rather, this critique only points out an ethical, not an epistemological, problem in the SPE.
    · Lack of suitable controls. The use of control groups is part of classical hypothesis-testing or experimental research design. However, some scientific research does not employ control groups. For example, some clinical trials (especially phase I trials) do not use control groups. They may be testing dose-response curves, among other things. In a broader sense, many types of knowledge production do not use control groups. Consider what we have learned from historigraphic techniques, or anthropological field sites. Therefore, lack of a control group does not invalidate the SPE.
    · Ecological validity. This would be a valid critique if Zimbardo’s point were to simulate a prison per se. However, this was not his point. Rather than simulating a prison strictu sensu, he was attempting to model what sociologist Erving Goffman called a total institution. These are institutions which completely strip individuals of their identities, including prisons, concentration camps, hospitals, and barracks. Therefore, this critique misses the larger point Zimbardo was trying to make.
    · Selection bias. This is a difficult and complex point, and may have some validity, although not as written in wikipedia. Selection bias in this case indicates some problem in subject recruitment, usually referring to a non-random sampling procedure (this is important for statistical reasons which I won’t go into). Zimbardo did select his subjects – young, white, male, Stanford students – so he cannot claim that his results apply to all people. It’s more accurate to say he stratified his sample. However, I think this critique also misses the target. The SPE was designed to examine how individuals fit into roles within institutional contexts. I think that even though the sample was non-random, Zimbardo showed how certain forms of ritualized humiliations work with some forms of masculinity. This is a more complicated issue, and I’m happy to discuss it more if you’re interested.
    · Small sample size. Many studies have small sample sizes, so this is relatively unimportant. I do agree that perhaps Zimbardo has gotten more mileage than he should have with a relatively small sample, this does not mean that the findings, or the epistemological significance, is invalidated.

    I haven’t addressed Fromm’s criticisms, because I think I’ve taken up enough space! But I’m happy to respond to these as well. One comment: the line you highlighted in Fromm’s quote (“But this does not imply that their reaction is only a function of their character and not one of their environment. It is merely naive to assume that it must be either this or that.”) is an overstatement of the SPE. That is, Zimbardo was not talking about something called “character” (whatever that might be) – he was examining social roles within institutional contextsa, and the processes through which these roles are reproduced.

    Your last line about replication in fact confirms the validity the SPE. I don’t think Zimbardo would be at all surprised about different results, but that’s for another post.

    Thanks for listening!

