Noam Nisan points to
the NSF trying out some new rules for reviewing in its upcoming SSS program.
There's a lot here to discuss. First, I'm glad to see the NSF is willing to try out some new reviewing approaches. They've been using the same approach for a long time now (1 or 2 day in person meetings, a reviewer panel drawn according to who is available and willing); I really haven't seen any discussion from the NSF as to why it's a good review system, and it's typically got some major cons (as well as, admittedly, some pros). But as far as I know -- and perhaps some people are more knowledgeable than I am on the topic -- it's not clear at all to me why it's become the stable equilibrium point as a reviewing method.
That being said, there's some clear pros and cons to this experiment. Some features + initial off-the-cuff commentary.
1. No panel review. Proposals will be split into groups of 25-40, and PIs in the group will have to review other proposals (they say 7 here) in that group. [If there are multiple PIs on a proposal, one has to be the sacrificial lamb and take on the role of reviewer for the team.]
I kind of like the idea that people submitting proposals have to review. One of the big problems in the conference/journal system is that there's minimal "incentive" to review. Good citizens pay back into the system. Bad citizens don't. This method handles the problem in a natural way -- you submit, you review. There are many potential problems with this method to be sure (as we'll see in the proposed implementation below).
2. A composite ranking will be determined, and then the "quality" of the reviews of the PIs will be judged against this composite; then the PIs ranking may be adjusted according to the quality of their reviews.
Ugh. Hunh? I get the motivation here. You've now forced people into doing reviews, who may not want to. So you need an incentive to get them to do the reviews, and do them well. One incentive is that if you're late in your reviews, your own proposal will be disqualified. That seems fine to me. But this seems --- off. I should note, they have a whole subsection in the document labelled
Theoretical Basis:
The theoretical basis for the proposed review process lies in an area of
mathematics referred to as mechanism design or, alternatively, reverse
game theory. In mathematics, a game is defined as any interaction among
two or more people. The purpose of mechanism design is to enable one
to “design” the “mechanism,” namely the game, to obtain the desired
result, in this case to efficiently obtain high-quality proposal review
while providing the advantages noted above. In mechanism design, this
is done by formulating a set of incentives that drive behavior in the
desired direction. The mechanism presented here was devised by Michael
Merrifield and Donald Saari [1].
I suppose I now have to go read the Merrifeld and Saari paper to see if they can convince me this a good idea. But before reading that, there are multiple things I don't like about this.
a) Why is "reviewer quality" now going to be part of how we make decisions about what gets funded? I'm not sure to what extent, if any, I want "reviewer quality" determining who gets money to do research. Here's what the document says:
To promote diligence and honesty in the ranking process, PIs are given a
bonus for doing a good job. The bonus consists of moving their
proposals up in the ranking in accordance with the accuracy with which
their ranking agrees with the global ranking. This movement will be
sufficient to provide a strong incentive to reviewers to do a good job,
but not large enough to severely distort the ranking merely as a result
of the review process. Recognizing that, if all reviewers do an
excellent job of ranking the proposals they review, all PIs’ proposals
will be moved up equally, which means that the ranking will not be
changed, the maximum incentive bonus will be a movement of two
positions, that is, a proposal could be moved up in the ranking to a
position above the next two higher proposals.
With funding ratios at about 15% (I don't know what the latest is, but that seems in the ballpark), two places could be a big deal in the rankings.
b) Why is there the assumption that the group ranking is the "right" score -- particularly with such small samples? I should note I've been on NSF panels where I felt I knew much better than the other people in the room what were the best proposals. (Others can judge their confidence in whether I was likely to have been right or not.) One of the pluses of face-to-face meetings is that a lone dissenter has a chance to convince other reviewers that they were, well, initially wrong (and this happens non-trivially often). I'm not sure why review quality is judged by "matching the global ranking".
c) Indeed, this seems to me to create all sorts of game theoretic problems; my goal in reviewing does not seem to be to present my actual opinion of a paper, but to present my belief about how other reviewers will opine about the paper. My experience suggests that this does not lead to the best reviews. The NSF document says:
Each PI will then review the assigned subset of m proposals,
providing a detailed written review and score (Poor-to-Excellent) for
each, and rank order the proposals in his/her subset, placing the
proposals in the order which he/she thinks the group as a whole will
rank them, not in the order of his/her personal preference.
But then it says:
Each individual PI’s rankings will be compared to the global ranking,
and the PI’s ranking will be adjusted in accordance with the degree to
which his/her ranking matches the global ranking. This adjustment
provides an incentive to each PI to make an honest and thorough
assessment of the proposals to which they are assigned as failure to do
so results in the PI placing himself/herself at a disadvantage compared
to others in the group.
So I'm saying I'm not clear myself how their incentive system -- based on the global ranking --- gives an incentive to make an honest and thorough assessment. Even the document itself seems to contradict itself here.
d) This methodology seems ripe for abuse via collusion -- which is of course against the rules:
The PIs are not permitted to communicate with each other regarding this
process or a proposal’s content, and they are not informed of who is
reviewing their proposals.
But offhand I see plenty of opportunities for gaming the system....
e) This scheme is complicated. You have to read the document to get all the details. If it takes what seems to be a couple of pages to explain the rules of the assignment and scoring system, maybe the system is too complicated for its own good.
That came out pretty negative. Again, I like the idea of experimenting with the review process. I like the idea that submitters review. I understand the concept that we somehow want to incentivize good reviews, and that's very difficult to incentivize.
This actual implementation... well, I'd love to hear other people argue why it's a good one. And I'd certainly like to hear what people think of it after it's all done. But it looks like the wrong way to go to me. Maybe in the morning, with some time to think about it, and with some comments from people, it will look better to me. Or maybe, after others' comments, it will seem even worse.