(By Walter Frick)
“Suppose that we want to answer a disputed question of fact… A great deal of evidence suggests that under certain conditions, a promising way to answer such questions is this: Ask a large number of people and take the average of the answer… When the relevant conditions are met, the average answer, which we might describe as the group’s “statistical answer,” is often accurate, where accuracy is measured by reference to objectively demonstrable facts.“
One thing is for certain: society’s reliance on human judgment, unaided by computers, is going to decrease. This point is made forcefully in a recent post by MIT’s Andrew McAfee, and in Average Is Over, the latest book by economist Tyler Cowen.
But machine intelligence will not simply displace human intelligence; it is also poised to complement it. In many areas, the combination of human and machine intelligence will outperform either on its own.
How will this collaboration work? Cowen and McAfee both address this, differing on one key dimension: order.
Cowen’s thesis is that one’s ability to augment machine intelligence will define one’s value in the labor market. “Are your skills a complement to the skills of the computer,” he asks, “or is the computer doing better without you?” His key metaphor throughout the book is that of a freestyle chess player. In freestyle chess, human and computer teams play together, and are able to outperform either on their own (at least for now). The key challenge for the freestyle player is not to be a master of chess, but to understand the strengths and weaknesses of chess programs so as to know when to trust their recommendations and when to override them. Here’s how Cowen describes his own amateur experience playing freestyle:
I’ve spent many hours playing a form of Freestyle at home… My procedure is simple. I play Shredder [a chess program for iPad] against itself, but every now and then I overrule the decisions of the program. In essence it’s “me plus Shredder” against Shredder. The human-computer team usually wins. At four or five crucial points during the game, I override the strategic judgment of the program and come up with a better move, or at least what I think is a better move. Then I let the superior execution of the program take over. This works maybe four times out of five.
In this model of human-machine collaboration, the computer handles the bulk of the decision-making, with the human adding a layer of judgment on top. It is echoed in Nate Silver’s book The Signal and the Noise, which uses the rather remarkable example of meteorologists:
The programs that meteorologists use to forecast the weather are quite good, but they are not perfect. Instead, the forecasts you actually see reflect a combination of computer and human judgment… Some of the forecasters [at the National Weather Service] were drawing on these [computer-generated] maps with what appeared to be a light pen, painstakingly adjusting the contours of temperature gradients produced by the computer models… The forecasters know the flaws in the computer models… The unique resource that these forecasters were contributing was their eyesight… According to the agency’s statistics, humans improve the accuracy of precipitation forecasts by about 25 percent over the computer guidance alone, and temperature forecasts by about 10 percent.
But McAfee’s idea of collaboration looks quite different. Here he is responding to experts who claim to consider algorithmic output before making a decision, like the freestyle chess player:
The research is clear: When experts apply their judgment to the output of a data-driven algorithm or mathematical model (in other words, when they second-guess it), they generally do worse than the algorithm alone would. As sociologist Chris Snijders puts it, “What you usually see is [that] the judgment of the aided experts is somewhere in between the model and the unaided expert. So the experts get better if you give them the model. But still the model by itself performs better.”
Things get a lot better when we flip this sequence around and have the expert provide input to the model, instead of vice versa. When experts’ subjective opinions are quantified and added to an algorithm, its quality usually goes up. So pathologists’ estimates of how advanced a cancer is could be included as an input to the image-analysis software, the forecasts of legal scholars about how the Supremes will vote on an upcoming case will improve the model’s predictive ability, and so on. As Ian Ayres puts it in his great book Supercrunchers, “Instead of having the statistics as a servant to expert choice, the expert becomes a servant of the statistical machine.”
This idea isn’t new — it fits within a long literature about information aggregation that doubles as the intellectual justification for crowdsourcing. As law professor Cass Sunstein summarized in his 2006 book Infotopia:
Suppose that we want to answer a disputed question of fact… A great deal of evidence suggests that under certain conditions, a promising way to answer such questions is this: Ask a large number of people and take the average of the answer… When the relevant conditions are met, the average answer, which we might describe as the group’s “statistical answer,” is often accurate, where accuracy is measured by reference to objectively demonstrable facts.
What is averaging, after all, but an extremely simple algorithm?
These two models for human-machine collaboration raise a number of questions, including which will be more effective, and whether human judgment will be valued more in the market when the human is feeding the algorithm or when the human is overriding it.
Some of the most interesting research on these questions is being done by Philip Tetlock of Wharton, who is running a series of geopolitical forecasting tournaments and measuring how well teams of experts do at prediction, compared to algorithms based on those same experts’ analysis. Here’s how he described this work at Edge.org in 2012:
In our tournament, we’ve skimmed off the very best forecasters in the first year, the top two percent. We call them “super forecasters.” They’re working together in five teams of 12 each and they’re doing very impressive work. We’re experimentally manipulating their access to the algorithms as well. They get to see what the algorithms look like, as well as their own predictions. The question is–do they do better when they know what the algorithms are or do they do worse?
Ultimately, the debate over which model of collaboration is most effective is likely to become as circular as the one over the chicken and the egg; algorithms will feed humans who will feed algorithms who will feed humans and so on. Decision-making is, after all, a multi-layered process.
The controversy will be over who gets final say, the human or the algorithm. Philosophically, this may seem a moot point for now, so long as humans control computers. Even if we “let” the algorithm have the final say, we’re the ones making that decision. Still, this final layer of decision making will be contentious in practice. Imagine the case of a medical diagnosis. A doctor consults the advice of an algorithm which uses your health records to remind her of the most statistically likely ailments. She then questions you and submits her observations and her best diagnosis to yet another algorithm which issues your “final” diagnosis. If the doctor has a nagging suspicion that the algorithm’s diagnosis is wrong — one that she can’t explain except with reference to her intuition and experience — whose diagnosis do you act on?
Different answers to this dilemma will be appropriate for different sectors, based on both efficacy and what we’re able to stomach. Determining which approach works when will require plenty of research and a good bit of trial and error.
Nonetheless, combined human and machine intelligence, in one form or another, will likely define much of our work lives. Unless, of course, we put it off long enough that computers start making decisions without our help at all.
“Opinion pieces of this sort published on RISE Networks are those of the original authors and do not in anyway represent the thoughts, beliefs and ideas of RISE Networks.”