Buzzword

The Judging Process

After Season 1 we sent a survey to Buzzword players. Many respondents commented on the answer-judging process and its pace. While we understand that the judging process takes some time, we believe it is the best feasible way to evaluate answers. Here is an outline of the answer judging process and our rationale behind it:

Before a Buzzword game opens for play, our software automatically generates likely acceptable and unacceptable answers. Judges review them and verify whether they are indeed correct or incorrect and adjust the rulings accordingly. Judges also add likely acceptable answers they can think of that weren’t automatically generated, e.g., WW1 for World War I.

When a player types an answer to a Buzzword question, the answer is automatically compared to every correct and incorrect answer that has already been ruled on for that question (including the automatically generated answers, answers manually added by judges, and answers already given by other players). If there is an existing ruling, the same judgment (i.e., correct or incorrect) is applied to the answer given. Otherwise, that answer is tentatively marked incorrect and flagged for a judge to review.

While a Buzzword game is open for play, judges review newly submitted answers several times per day, examining answers that have been flagged for review. If the flagged answer is correct (e.g., due to being an acceptable typo away from an expected answer), the ruling is changed and the scores of players who gave that answer are updated. If another player later gives the same answer, their answer will be automatically ruled correct too.

For example, if a question’s desired answer is Massachusetts and a player answers Masachusets, the player will initially be ruled incorrect. Soon—likely within a few hours—a judge will review this answer and mark it correct (as it is a phonetic equivalent of the expected answer). After that, all future answers of Masachusets will automatically be marked correct without the judge needing to review it again. Likewise, if someone answers that question with Iowa and the judge marks it incorrect after review, every other answer of (exactly) Iowa will be ruled incorrect without the judge needing to review it again.

In practice, each Buzzword question often ends up with over 200 different answers submitted by players—some just plain wrong, some probable misspellings that are too far off to be acceptable, some misspellings that are acceptable, some alternate versions of the expected answer, etc. Because judges are dealing with so many answers, they sometimes make mistakes. We provide the protest system so players can bring possible errors to our attention.

When NAQT began developing Buzzword, we decided early on that we wanted this to be a serious competition. NAQT has always had fairly rigorous standards about what constitutes a correct answer. One of our goals in development was to craft a set of rules governing acceptable answers that allowed us to maintain these standards while acknowledging the inherent differences between oral and written answer submission. For example, the rules about typos have no direct parallel in face-to-face quiz bowl, but they are similar to the rule about accepting phonetic equivalents (e.g., a pronunciation of go-ETH-uh for Goethe).

The controlling decision for the entire process of judging is how to handle unexpected answers. Assuming we want human review (which we do), we thought of three plausible possibilities:

That said, the written format of Buzzword is new to us too, and we are continually evaluating our rules and procedures in hopes of improving the player experience. We have introduced some changes to acceptability rules for Season 2 (becoming a bit more lenient about typos).

We want to continue making Buzzword a rigorous, fun competition and are always open to feedback at buzzword@naqt.com. We hope you are enjoying the game!