Divining the jury

Juries are confused, but Australian courts don’t seem interested in understanding why

Jeremy Gans 11 June 2013 3484 words

A world without certainty: Henry Fonda (second from left, above) in the jury room in Sydney Lumet’s Twelve Angry Men.

Jury Decision-Making: The State of the Science
By Dennis J. Devine | New York University Press | $39.95

Dinner conversation at Brighton’s Old Ship Hotel was light, if not wholly off-topic. After all, jury members aren’t supposed to talk about their trial outside the jury room. When the talk turned to ghosts, psychics and the like, the bailiffs who were keeping an eye on them simply scoffed and went off to bed, unaware that four jurors had decided to get together later to have a few drinks and play with a makeshift Ouija board.

After a few turns at pushing the glass back and forth between lettered scraps of paper, during which two jurors conversed with relatives, the evening took a turn. “Who is it?” asked Ray, the jury foreman. The glass spelt out H-A-R-R-Y-F-U-L-L-E-R, one of the newlyweds who had lived in a heritage-listed house in the Sussex village of Wadhurst until their execution-style killing a year earlier. The jury had spent the afternoon pondering the guilt of the alleged killer, so it was hardly surprising when Ray asked, “Who killed you, Harry?”

How do juries reach their verdicts? Dennis J. Devine, an American psychology professor, knows the answer for one jury, at least. Somehow, he got to perform jury service despite telling the prosecutor that the only thing he believes we can know for certain is “maybe gravity.” He still wonders whether his jury reached the right verdict. As a social scientist, he knows that making any guesses about the other roughly 150,000 jury trials held every year is about as useful as holding a séance. And yet he has written a book about Jury Decision-Making subtitled “the state of the science.”

The science Devine describes is drawn from a selection of the estimated 1500 studies of jurors published over the last six decades, nearly all (like the vast majority of jury trials) in the United States. These include the first major jury study, which was conducted in the 1950s by a psychologist, a sociologist and a legal academic from the University of Chicago. The Chicago trio not only surveyed hundreds of judges and thousands of jurors but actually taped the deliberations of five juries (with the permission of the lawyers and judges in the cases but without the knowledge of the jurors themselves). The ensuing outcry from lawyers’ groups led to America-wide laws guaranteeing that what happens in the jury room must stay in the jury room. A similar law applies in England, but — after a juror went public with the Brighton Ouija board incident — a court ruled that it didn’t extend to a juror’s hotel room.

Given these legal limits, what do Devine’s juror studies actually study? Most of them look at “mock” jurors. That is, they examine how people act when asked to play the role of jurors in pretend trials. Such studies make up two-thirds of Devine’s sample and they are indeed easy to mock. For starters, most of the pretend jurors are psychology students, whose main resemblance to real jurors is that they, too, are forced to participate. Researchers lack Hollywood’s resources when it comes to simulating real trials and many therefore opt for paper summaries of the evidence. Even the best studies, which use representative participants and true-to-life scenarios, lack the two singular features of real trials: real evidence and real consequences. Only a very special mock jury would try to commune with a fictitious murder victim.

Devine argues that there’s a major upside to this lack of realism: fake trials can be used to conduct interesting experiments. For instance, a 1976 study showed three groups of captive psychology students the same fifty-minute rape trial re-enactment with just one variation: how the judge defined the iconic term “beyond reasonable doubt.” The students who were told that they had to be “sure and certain” before they convict (as English jurors are told) were much more likely to acquit than ones who were told to consider whether any doubts they had were “substantial” or “fair” (as some American jurors were once told). Especially after the students “deliberated” as six-person juries, these differences translated into diametric outcomes on a line-ball dispute. This and later studies have been carefully considered by the top courts in the United States, Canada and New Zealand when they were developing jury directions on the meaning of reasonable doubt.

Not so in Australia. Our High Court has long maintained an absolute ban on any definition of “beyond reasonable doubt.” The two reasons for its stance — that the term is “understood well enough by the average man” and that defining it risks diminishing it — are at odds with the 1976 study. The third group of psychology students — given no definition at all, just like every Australian jury today — split evenly in their individual verdicts and were much more likely to produce “hung” verdicts as groups. Given that, it’s no surprise that Australian juries frequently ask for a definition. A jury trying three men for the 2000 murder of Melbourne lawyer Keith Allan, for example, asked their judge if “beyond reasonable doubt” meant “70 per cent or 80 per cent sure?” The three accused won a new trial because the judge failed to admonish the jury for attempting to put a number on reasonableness.

When the trio were convicted again, they complained to the High Court that their new jury (which didn’t ask for a definition) may have been as confused as the earlier one. Their timing was propitious. A year earlier, a NSW government agency published the results of its survey of more than 1000 jurors just after they had rendered verdicts in real trials. The survey asked whether they thought “proof beyond reasonable doubt” meant “sure,” “almost sure,” “very likely” or “pretty likely.” The good news is that 80 per cent picked the two most stringent definitions, suggesting that most Australian verdicts are reached on this basis. The bad news is that the High Court’s approach amounts to a role of the dice for defendants in close cases. There’s always a chance that they may get a jury consisting largely of the 20 per cent who were evenly split between “very likely” and “pretty likely.”

On this matter, unfortunately, our High Court isn’t moved by the research. Justices Susan Crennan and Virginia Bell refused even to give the trio leave to appeal. Without apparent irony, they added that they saw “no reason to doubt” that the trio were fairly tried.

The Fuller case started with a phone call. “Emergency, which service?” The sounds at the other end of the line were unintelligible. The operator kept saying “pardon?” until the sounds began to resemble squeals, followed by a bang and the sound of the phone hitting the floor. The operator listened to a few more minutes of footsteps and doors opening and closing and then logged the call as “child on the line.” The truth is almost too awful to relate.

Nicola Fuller spent the last moments of her life trying to get local police to walk just 200 yards from their station to the house she shared with her new husband Harry. She couldn’t speak clearly because her jaw had been shattered by three bullets fired at close range. Her killer calmly listened in on another phone and then entered her marital bedroom to finish her off, shooting her again through a pillow. Her father, who was the first to discover the dead newlyweds days after, vowed to sue British Telecom for crushing his daughter’s final hopes of rescue.

At the murder trial, one of the jury’s tasks was to try to work out whether the footsteps on the tape of the call were of one person (Stephen Young, the insurance broker who prosecutors said was the lone killer) or two (matching Young’s claim, backed up by two unrelated teens, of seeing a pair of strangers in the house after he knocked at the door). When one juror listened to the tape, she burst into tears and fled the courtroom, prompting the judge to discharge the entire jury. The new, more stoic jury included the quartet who later tried to solve the mystery of the footsteps with a Ouija board.

One of the main jobs of criminal trial judges is to try to keep jurors sane while they perform an alien and sometimes awful task. For instance, the judge in Young’s trial had to decide not only whether letting the jury hear the emergency call was worth the emotional cost and consequent risk of an irrational verdict (apparently it was), but also whether parts of it should be edited out (the worst parts were) and whether a witness somehow claiming to be an expert in distinguishing the sounds of two shoes from four should be allowed to tell the jury what he thought (he was). A few years ago, the US Supreme Court was asked to decide whether the music of Enya, playing in the background of a movie about the victim’s life shown to a jury deliberating whether to impose the death penalty, was soothing (assisting the jury in calmly assessing the images) or stirring (which might make them too emotional). It ducked the question.

Trial judges receive no training for their informal role of court psychologist and have no time to read up on the research (such as a recent Australian study that Devine describes which suggests that the link between gruesome photos and convictions has more to do with the fact that they are photos than their gruesomeness). For time-poor judges, the closing chapter of Devine’s book may serve as a useful primer. He draws together a range of studies to flesh out the current “model of choice” among psychologists about jury thinking. At its core is the creation of stories. Jurors come to the trial with stories already in mind based on their own experiences, gradually develop a more detailed scenario over the course of the trial and then compete to inject the main elements of that account into the discussion with other jurors. Importantly, the model’s implications — in particular, that jurors aren’t much concerned with the sources of the information they receive, almost never separate out different pieces of evidence and don’t wait out the trial before making big decisions — sit very poorly with the methods Australian judges try to use to keep jurors rational.

The news is much grimmer if you look beyond mock-jury studies to the wider field of psychology. Consider a German study that Devine’s book doesn’t cover, but which is described in Daniel Kahneman’s recent book Thinking, Fast and Slow. The researchers’ goal was to test whether the well-known phenomenon of “priming” — where people who are asked to estimate things (the population of Kenya, for example) change their guesses if a particular number (“Three million?”) is casually mentioned first — may affect legal thinking.

In this experiment, German judges were asked to perform the standard task of sentencing a thief. The researchers’ trick was to insert a pretend phone call from a journalist into the mock scenario. All the judges insisted that they totally ignored the journalist’s query about whether a certain number of years was likely. And yet the researchers found that the judges’ sentences differed depending on what number the journalist mentioned. The same happened again when the researchers told the judges that the suggestion was randomly generated. Even more astonishing, the effect was repeated when the judges were simply asked to roll some (loaded) dice and add up the numbers before issuing their sentence. When questioned about their thinking, the judges primed with a higher number emphasised the bad features of the theft and the others emphasised the sad features of the thief’s life, each oblivious to the profound impact of meaningless numbers on their professional judgement.

One Australian judge who pays attention to social science research on how jurors think is Peter McClellan, a senior judge of the NSW Supreme Court. In a 2010 murder appeal, McClellan made a surprising ruling about how jurors should be informed of the chance that two people might have the same (incriminating) DNA profile. He ruled that it is better to express that chance as the odds that someone would have the same profile (for example, “one in 1600”) rather than (as many state witnesses prefer) as the percentage of people who wouldn’t have that profile (for example, “99.9375 per cent”). McClellan’s judgement detailed mock-jury research by Northwestern University’s Jay Koehler, who found that jurors are more likely to convict under the second formulation than the first, even though the numbers are mathematically identical. As Devine explains, Koehler’s findings are consistent with the central role of stories in jury decision-making: using a percentage makes jurors focus on the fact that most people have quite different DNA profiles (which suits the prosecution case), while using the odds make jurors think about the few people who might have the same profile (which suits the defence case).

Alas, last year, the High Court gave McClellan a dressing down. His sin was reading articles about “not the law but psychology.” Chief Justice Robert French, together with the same two judges who were unmoved by the research on “beyond reasonable doubt,” held that judges shouldn’t rely on non-law studies themselves, but instead should wait for the parties to call the academics as expert witnesses. But Koehler and Devine shouldn’t start planning their down-under holidays quite yet. As the High Court well knows, no Australian criminal defendants have the spare cash to fly in an overseas academic psychologist to read out a journal article to a judge.

Having dispatched Koehler’s studies without needing to read them, the nation’s top judges went on to reject McClellan’s preference for frequencies over percentages. After all, they explained, the expressions are mathematically identical. So, how could a rational jury be affected by a meaningless difference between two numbers?

When word of the English jurors’ séance got out, the prosecution argued that it was just a drunken game. But the Court of Appeal, noting that Harry Fuller’s alleged words prompted two of the jurors to tears, overturned the verdict as a material irregularity. An op-ed in the London Times queried the court’s view of rationality: “suppose the jurors in the hotel had sought advice from their god through prayer?”

As it happens, courts routinely make decisions based on their own peculiar faith. The English appeal judges made an order barring the press from reporting the actual words Fuller spoke from beyond the grave until after the retrial. As if anyone would have any trouble working out which mystery man Fuller had fingered for his own murder the night before the jury unanimously returned a guilty verdict against Stephen Young.

Today, a trial judge would probably be relieved if a jury spent its downtime playing with a glass and some scraps of paper. All Harry Fuller told the Brighton jury was the name of his killer. He was understandably stumped by their detailed follow-up queries about the events following his death. After a few vague guesses, he directed them to the police. Two decades on, a juror seeking extra information will doubtless go to a different, much more forthcoming medium.

Devine’s jury studies largely pre-date the era of the internet. Many quaintly use made-up “slanted” newspaper articles to test how jury deliberations are affected by non-trial information. Their findings are mostly predictable — articles that tilt against the accused (as most do) lead to more guilty verdicts in close cases — but occasionally surprising. For example, exposure to media accounts dampened some jurors’ pre-existing biases in rape cases (such as an inclination to sympathise with either the defendant or the complainant). The most interesting results confirm Devine’s story model and trial judges’ worst fears: jurors make no apparent effort to ignore outside information and sometimes come to believe that what they read in the newspaper was actually evidence in the trial itself. Noting studies showing that judicial directions make little difference on these matters, Devine concludes that the only way to counter media coverage is to change a trial’s venue.

Only an American would fail to even mention the preferred solution in Australia, England and most everywhere else: censoring the media. But it is only our past that is a foreign country. Technology has given all Australians what our constitution won’t: a robust freedom to deliberate freely in public about questions before our criminal courts. Victoria’s courts conclusively demonstrated as much in 2008 with their clumsy ban on the broadcast of Underbelly, proving that judges are as powerless as TV networks when it comes to stopping Victorians from watching shows at the same time they are shown elsewhere.

That same year, Victoria’s parliament unwittingly staged a real-life jury study of its own. Partway through a long-running Melbourne terrorism trial, printouts from Wikipedia about the meaning of some key legal terms were discovered in the jury-room rubbish bin. The defendants drew on that discovery to press the same point repeatedly found in Devine’s studies: that judicial directions telling jurors to ignore the outside world (including non-Victorian media reports about the terrorism trial available online) were ineffective. Perhaps in response, parliament enacted legislation partway through the terrorism jury’s deliberations that criminalised such research by jurors. Although the jury had earlier been chastised for researching legal terms and was specifically told of the new offence provisions, a forbidden dictionary was later found in the jury room and the jury freely admitted to using it to research the same terms.

As the courts have slowly lost control, Australians have become accustomed to a bizarre public ritual in high-profile cases. The first step involves the courts, prosecutors and defence lawyers uniting in public outrage at the presence of information all over the internet that local media are forbidden from publishing, with potentially dire implications for the fairness of any future trial. The most obvious effect of these statements is to publicise the existence of such sites. The ritual’s second step occurs at the trial itself and in subsequent appeals, where courts and prosecutors suddenly part company from defence lawyers. It nearly always turns out that a fair trial can be had after all, despite the illicit information at the jury’s fingertips. Australian courts continue to insist that judicial directions (and, where they apply, criminal offences) permit them to assume that jurors won’t google.

Even when a court discovers that jurors have succumbed to temptation, it will almost always find (again contrary to virtually every one of Devine’s studies) that the verdict wasn’t affected. A disturbing example is the trial of Kathleen Folbigg, who remains just about the only mother who still stands convicted of killing her own children after each died of apparent SIDS. When it was belatedly discovered that a juror had uncovered and shared internet accounts of the young Folbigg’s having witnessed her father shoot her mother, Peter McClellan and Virginia Bell (who would later chastise McClellan for reading jury research) were in agreement: they did “not believe that there was any likelihood” that the jury would have seen any link at all between Folbigg’s previous victimhood and her alleged later crimes.

Devine begins his book by observing that “like all human institutions, the jury system is not perfect.” At the end, he predicts both “more high-quality research on juries” and “more application of what we have learned in the courtroom” and concludes: “Much more will be learned in the next fifty years about a unique societal institution that may yet represent the best way to make important legal decisions in a world without certainty.”

Others are more pessimistic about the benefits of understanding jurors. The Times op-ed about the Ouija board case observed that many think that if the veil of secrecy around the jury room were abandoned then “trial by jury would eventually go the same way.” Maybe that is why Australia’s courts steadfastly refuse to consider a vast body of research into an institution whose care is one of their key tasks.

But it is the “trial,” not the “jury,” whose frailties stand most exposed by Devine’s studies, especially the judiciary’s prized role as the gatekeeper of criminal evidence. Having destroyed the courts’ monopoly on providing information to the nation’s juries – just as it has broken the traditional media’s monopoly on the news – the internet may eventually force a choice between having serious criminal trials decided by judges on admissible evidence or by juries on all the evidence. If Australian judges stick to their traditional rules of thumb, folk psychology and finger-crossing about what happens in the jury room, then it is not at all clear whether it is jurors or trial judges who will eventually go the same way as the broadsheets. •