Russ Roberts: Our topic for today, Adam, is a shocking and exhilarating essay that you wrote on peer review. It is not often that ‘peer review’ and ‘exhilarating’ appear in the same sentence, but I loved your piece. It blew my mind for reasons I think will become clear as we talk.
Let’s start with the idea behind peer review. If you asked normal people–people not like you and me–who are what I would call believers in the system, what would they say is the whole–how is this supposed to work?
Adam Mastroianni: I think probably most people haven’t really thought about it, but if you asked them to, they would go, ‘Well, I assume that when a scientist publishes a paper, it goes out to some experts who check the paper thoroughly and make sure the paper is right.’ Maybe if you really push them to think about it, they would say, ‘Well, they probably maybe reproduce the results or something like that, just to make sure that everything is ship-shape; and then the paper comes out. And this is why we can generally trust the things that get published in journals.’ Of course, we know in any system, obviously, sometimes things slip through.
And, all of that is a totally reasonable assumption about how the system works; and it is not at all how the system works. And I think that’s part of the problem.
Russ Roberts: You could argue it’s kind of like how the king might have a taster.
Russ Roberts: Or two–even better. I mean, if the taster has got some idiosyncratic defense mechanism against toxins, having two people taste the food, it’s making sure neither die–it’s just a good system.
One of the things I learned for your paper–I didn’t really learn it, but I often emphasize how there are a lot of things we know that we don’t really remember to think about. One of the things that your paper reminds me to think about is that this system–which of course I grew up in over the last 40 years as a Ph.D. [Doctorate of Philosphy]–this system is kind of new in the History of Science. It hasn’t really stood the test of time. It’s an experiment, you call it.
Adam Mastroianni: Yeah. I think this is something that a lot of people don’t understand because–I think this is true across the board of human experience, we assume that whatever world we were born into unless told otherwise, this is just kind of the way it’s been forever.
And so, there’s sort of this cartoon story I think in a lot of people’s heads that somewhere in the 1600s or 1700s, we started doing peer review. We had journals; and before that, it was people writing manuscripts in the wilderness or whatever. Before that it was Newton publishing his stuff. But then we developed modern science, and it’s been that way since.
And, that cartoon story just isn’t true: that it is true that around the 1600s and 1700s we have the first things that look like almost they could be scientific journals that we have today, but they work very differently. A lot of times they’re affiliated with some kind of association and their incentives are different. They want to protect the integrity of the association. And, they’re just one part of a really diverse ecosystem of the way that scientists communicate their ideas.
So, they’re also writing letters to one another. There are basically magazines, or for a long time scientific communication looks much more like journalism looks today: that they cover scientific developments as if they are news stories.
So, you have a bunch of different people doing a bunch of different things, and it really isn’t until the middle of the 20th century that we start centralizing and developing the system that we assume today has always existed. Which is: if you, quote-unquote, “do science,” you send your paper off to a scientific journal. It is subjected to peer review and then it comes out. And all of that is very new.
Russ Roberts: Well, you kind of made a unintentional leap there. You said, ‘And then it comes out.’ That’s if it’s accepted.
Adam Mastroianni: Yes, exactly.
Russ Roberts: And, for listeners who are not in the kitchen of journal submission, rejection, or acceptance–sometimes revise and resubmit, it’s called–or some flags are raised and questions are raised, flags of things that might be wrong and you have a chance to try to make the people who reviewed it happy. The people who review, by the way, are called referees in most situations, and there’s usually two. So, that is the modern world.
The other thing that you haven’t mentioned is it takes a really long time. It’s kind of, again, I think shocking for people aren’t in this world.
What happens is you submit your paper and you–there’s a tendency, especially when you’re younger, as you are, Adam, relative to me, to sit by your inbox. In the old days it was a mailbox, but now it’s an email inbox–kind of like: Any day now, because I sent it, what, three hours ago, I’ll be getting a rave review from my two referees, and the editor will say, ‘I am thrilled to publish this in its own supplemental celebratory edition of our journal because it’s so spectacular and life-changing for the people in the field.’ But in fact takes a very long time.
Sometimes people are sent a paper to referee and they decide they don’t want to, but they don’t tell the journal editor right away–eventually–because they think, ‘Maybe I’ll do it.’ Then they eventually tell the editor, ‘You know, I just don’t have time.’ The editor sends it to someone else. And, even when the two referees agree to review it, they don’t review it quickly. There’s no real–sometimes there’s a sort of a deadline, but it’s a very frustrating experience for a young scholar. Right?
Adam Mastroianni: Yeah. My experience so far has been that if there’s only a year in between when you first submit the paper and when it comes out, you’re doing pretty good.
Russ Roberts: Shocking.
Adam Mastroianni: And, that’s assuming that you get it into the first place that you submit it, which is not the average outcome. Other places it could take years; and certainly if you are rejected from one journal or a few journals, it could take multiple years. And this is part of why I think so many people I know come to despise the things that they publish by the time that they get published.
Russ Roberts: We should add that–and again, this is only for the cooks in the kitchen–there are a lot of papers that rejected even if they are true, because they are not worthy or considered worthy of the journal. Your [?] are sort of top tier and then there’s second tier, then there’s third tier journals. So, you might aim high. The referees might say, ‘Oh, this paper is fine. There’s nothing really objectionable in it. But, the results are not that interesting. I don’t think it merits publication in the Journal of Fascinating Results.’ And so, you’re going to have to send it to the Journal of Somewhat Interesting Findings. Right? That’s a common phenomenon.
Adam Mastroianni: Yes. And, the funny thing from the user standpoint of science–like, when I’m working on a project and I want to know what has been done that’s relevant to this, I truly do not care which journal it was in. And so, all of this work that was done to figure out, like, ‘Okay: should this go out to a mailing list of–‘ I don’t know how many people Nature or Science emails. Say, it’s a hundred thousand, versus it should go out to 20,000 people, or whoever. It doesn’t matter to me because now I just want to know: what did people do? And, the letterhead on the top of the paper doesn’t matter.
So, all that work when someone is actually trying to use the thing turns out to be unimportant. This is done mainly for purposes of figuring out who should have high status.
Russ Roberts: Ooh, definitely kitchen, inside-kitchen remark. One other thing, again, for people, not in this world, at least in economics–and I don’t know about other fields as much, but I think it’s often true, at least in economics–the person who is reviewing the paper, the referee, knows who wrote it. Not always, but even when you don’t know, you can usually figure it out because of what the topic is. Or you can read the bibliography and see which author got cited the most times–often a hint.
But, the person who wrote the article often almost always does not explicitly know the reviewer. So, it’s called a blind review. It’s not double blind, but it’s a blind review from the perspective of the author. Often authors will thank, quote, “an anonymous referee” for a helpful comment.
The only other thing I would add, again, is that most of the time papers are not rejected because they’re not true. They’re rejected because they’re not interesting, or they’re not profound, or the results are not sufficiently important. Or they’re not completely convinced. There might be things left out.
So, the revise-and-resubmit comment from a referee is: You know, you didn’t deal with this. Deal with this and maybe we’ll take it.’ And that just adds another layer of delay and uncertainty about the final publication result.
Adam Mastroianni: Yeah. And this is where I think a lot of people misunderstand what the process is doing. They think what’s mainly happening when a paper is under review is that it’s being checked. And so, someone looks at the data, someone looks at the analysis.
But, most often, nobody is looking at the data. Nobody is looking at the analysis. It actually takes a ton of time to vet a paper to that level. You’d have to open up their data sets–which, by the way, often they’re not provided. You don’t have to. And, sometimes you do, but a lot of times you don’t. You’d have to redo all of their analyses.
It’s a big undertaking to actually check the results of a paper, which is why it’s virtually never done. Although that is, of course, maybe the single most important thing that this process could do, rather than provide some kind of aesthetic judgment.
When I encounter a paper, I’d love to know, ‘Well, did anybody just rerun the code and see if there’s some kind of glaring issue? Or if the code actually works? Or if the data actually exists?’ Whatever aesthetic judgment the reviewers applied, I mean, I am also, like, an expert consumer. I can look at it, too, and go, ‘Oh, I’m not completely convinced.’ But, maybe I’m getting ahead of myself here. But also, I don’t even get to see what the reviewers said. Most times, most places don’t publish the reviews.
So, all that I know is the reviewers said–they didn’t say enough disqualifying things to prevent it from being published in this journal. But, I don’t know if they said, ‘I’m really convinced by this point, but not that point.’ Or, ‘Here’s another alternative explanation that I think warrants inclusion.’ I don’t get to see any of that as a consumer, because generally the reviews disappear forever once the paper is published.
Russ Roberts: And, you’re talking about empirical work. There’s theoretical work as well, where there’s a mathematical proof, say, or an intellectual, analytical set of postulates and analysis. And it’s–I think–well, you claim and I’m afraid you’re right, at least often, that the referees don’t actually read the paper. They kind of eyeball it. They say–I think what we say to ourselves is, ‘Well, if this person is at such and such university, I’m sure they got the equation–I’m sure the math is right. I mean, they wouldn’t make, like, an algebraic error. So, I’m not going to literally check their equation. That would be tedious. Take hours.’
The only question I’m going to generally answer as a referee is: Is this result interesting? Is it consistent with the claims, or the claim is consistent with each other? Does the person deal with previous literature that’s been written on this? Is this novel?
But, it becomes the real question–which your essay tells [?] quite frankly, which is–I mean, it’s an interesting idea. It sounds plausible. Does it work?
Adam Mastroianni: Yeah. Does peer review work?
I mean, it really depends on what you hope to get out of it. My position would be, no. In part because I think what we would all like to get out of it is some kind of checking. We’d like to know if the papers that we’re reading are true or not.
The system obviously doesn’t do that.
And, it doesn’t do that, but it comes at extreme costs. So, we’ve talked about how long it takes the paper to get through the process, but there’s also the time spent by people reviewing it, which one paper estimates that as 15,000 person-years, per year. Which is a lot of years, especially when these are scientists. These are people who are supposed to be working on the most pressing problems of humanity, and instead they’re spending a lot of time sort of glancing to get papers and going, ‘Eh, not interesting. This one is interesting.’
And a lot of those papers will never be cited by anybody. It’s really hard to get a precise estimate of the number of papers that are never looked at by anybody ever again. But, we know that it’s not zero. And, I think a reasonable estimate in the Social Sciences is something like 30%. And, that would probably go up if you exclude papers that are only ever cited by the people who wrote them. And so, that’s a lot of time spent on a paper that didn’t even matter in the first place.
Russ Roberts: Yeah. The number I saw recently was 80%–that basically 80% of papers are never looked at again. A bit harsh. Could be true. You have to be[?] a referee to see whether that’s a true statement.
Russ Roberts: To be fair to listeners out there who are in this world, some of them are sitting here, sitting listening with things saying, ‘This is the most cynical bunch of nonsense I’ve ever heard. I’ve reviewed dozens and dozens of papers in my time. I take my responsibilities over every extremely seriously.’ You get paid by the way, often. Not always, but often–a modest amount. And, sometimes–there’s been a big innovation in recent years–you get paid more if you do it in a timely fashion, which is pleasant. I mean, it’s nice for the submitter, the author.
But, how do you answer that? Come on. You’re claiming people don’t read the paper? You have no evidence for that. That’s just a cultural armchair thesis. And: ‘I’m a serious reviewer. I make sure the papers are right; I read them carefully; I vet them. And I am confident that the papers I have published–or less true.’
Adam Mastroianni: To that reviewer, I’d say, ‘Thank you for your service. And, you are a lone hero on the battlefield.’ Because there have been studies done where they look at, well, on average what reviewers do. The British Medical Journal, when it was led by Richard Smith, did a lot of this research where they would deliberately put errors into papers–some major errors, some minor errors–send them out to the standard reviewers that the journal had, get the reviews back, and just see what percentage of these errors did they catch.
On average across the three studies that they did on this, it was about 25%.
And, these were really important and major errors. For instance, the way that we randomized the supposedly randomized controlled trial wasn’t really random. Which is really important. That’s, like, a very key error to find. If you’re doing a randomized controlled trial, it needs to be randomized.
And for that particular error, only about half of people found it. And, that’s a very, like, standard one to look for. That should be very forward in your mind when you are looking at a paper.
And so,–and I’ve heard from them as well, people who take their job really seriously. And I think they are the minority. What’s most important about the system is how it works on average. I think on average it doesn’t work very well–certainly, at catching major errors.
You can see this–another piece of evidence is: When we discover the papers are fraudulent, where does that happen? And, you would think that if it was happening–if people were vetting the papers, it would happen at the review stage. And it’s hard to find the dog that didn’t bark, but I’ve never heard a single story of a fraudulent paper being caught at the review stage. It’s always caught after publication.
So, the paper comes out; and someone looks at it and they go, ‘That doesn’t seem right.’ And, purely of their own volition–and, these people are the true heroes–they just decide to dig deeper. And find out, ‘Oh, it’s all made up,’ or ‘the data isn’t there.’ Often this is someone from within the world that the paper was published, so it’s someone in the same lab, who goes, ‘I just know that there’s something creepy going on with these results.’
There was a big case in psychology last year, where a paper came out 10 years ago. This paper about signing at the top versus at the bottom: If you sign a form at the top–ooh, this is a good story. The paper was all about if you sign your name at the top of a paper where you have to attest to something–in this case it was how many miles you drove a car. So, obviously there’s some incentive to lie on this because the fewer miles you drive the less you have to pay. And so, if you sign at the top, you should be more honest and you should report more miles than if you sign at the bottom. It’s like a very cutesy kind of–
Russ Roberts: Why? What’s the logic?
Adam Mastroianni: It’s because of psychology. I don’t know. This is kind of what we do. ‘Oh, you’re reminded of–you’re not anonymous,’ and–sorry, the thing you’re signing is specifically like, ‘I’m going to be honest.’ And so, if you do that at the beginning, you’re going to be more honest than if you do that at the end.
And so, they found that this is true in some real world data. I mean, this data turns out to not be real world because the data was obviously made up.
That paper comes out. It’s put in PNAS [Proceedings of the National Academy of Sciences], which is a very prestigious journal.
And, ten years go by. And, someone tries to replicate the results and they can’t do it. And so, they publish their failure to replicate. That’s all great.
As part of publishing that failure to replicate, they also publish for the first time the raw data from the original study, which had never been published before.
And, someone takes a look at it and notices that there are some weird things. For instance, it’s an Excel spreadsheet and half of the data is in a different font than the other half of the data. Or, you also notice that if you plot the distribution of the miles that people claim to drive, it’s totally uniform–which is really weird because when people report their miles, they almost certainly report–you know, they don’t report 3,657. They report 3,600 or 3,650.
But, people were just as likely in this data to report 57 as they were to report 50.
And so, if you basically look a little closer, you realize that, like, this data is obviously fabricated, the effect that they tried to show. They just added some numbers to the original data. There’s a great blog post on Data Colada who are some psychologists who do a lot of work on replication.
So, all of that happened 10 years after the original paper was published and all the detective work couldn’t even have happened at the beginning because the data was never made available to anybody.
So, if we’re not catching it at the review stage, what exactly are we doing?
Russ Roberts: Now, listeners may remember that back in 2012, I interviewed Brian Nosek, who is also a psychologist and has been a very powerful voice for replication. And, again, if you’re not in the kitchen, you wouldn’t realize this: Replicating someone else’s paper is almost worthless historically in over the last 50 years of this process. And, if you have suspicions and a result might be true, you think, ‘Well, I’ll go find out. I’ll do it again.’
Well, if you find out that it is true, nobody wants to publish it. There’s nothing new there.
You find out it’s not true: maybe it isn’t, maybe it is, but it’s not a prestigious pursuit to verify past papers.
So, what Brian and others have done in this project is to try to bring resources to bear, to encourage people to do these kind of checking. And, results have been deeply disturbing–how few results replicate. Particularly in behavioral psychology, but that’s just because that’s where they started.
I think it’ll end up coming to economics. We know it’s also true in medicine. Certainly true in epidemiology. And, Brian and his co-authors, Jeffrey Spies and Matt Motyl had a early version of your essay summed up in one beautiful phrase: Published and true are not synonyms.
Adam Mastroianni: Yes. [More to come, 21:26]