Summary: The Signal and the Noise

By exploring prediction methodologies as they pertain to events ranging from earthquakes to chess, The Signal and the Noise by Nate Silver offers insights on the art of prediction. As the title suggests, a central theme is separating signals (underlying truths), from the noise (the plethora of available data that forms meaningless patterns that can be mistaken for signals). The book makes a case for using Bayesian logic, thinking probabilistically, for better predictions. The premise is that by using information gathered from past events, you can predict future events.

Rather than read the physical copy, I listened to it at 2.5x the recorded speed. It was an interesting method to try.

Rather than read the physical copy, I listened to it at 2.5x the recorded speed. It was an interesting method to try.

Bayes’ theorem is an equation that takes multiple factors, expressed as probabilities, into account. Both general probabilities, things that are generally true, and conditional probabilities, probabilities based on if-then situations are used. For example, I eat hummus for lunch 50% of the time. (Probability.) If it’s raining, I only eat hummus 10% of the time. (Conditional probability.) The result of the equation is a likelihood of the event occurring. If you used the equation to evaluate whether or not I am likely to have hummus for lunch next Tuesday (using more information than I provided here) you may determine that there’s a 20% chance that I’ll have hummus, which does not rule out the possibility of a hummus lunch, but indicates that it is less likely than my having something else. For more information on Bayes’ theorem, check here.

Silver also addresses the value of the human versus computer in prediction. Despite a computer’s ability to sort and process massive amounts of data, humans sometimes have an edge, at least for now. In baseball, for example, the book examines predictions of player performance down the line. Computer programs made a list of players they predicted would do well, and human scouts made predictions. When the predictions were examined years later, the human scouts were more accurate. They were able to take in less quantifiable data that the computer did not consider, like personality.

Qualitative data plays a role in certain  types of predictions, although it is difficult, perhaps impossible, to take personal bias out of it, which may lead to error. Personal bias may lead us to over-emphasize certain information, while disregarding other data. The predictions that most obviously benefit from qualitative data involve human behavior. For example, predictions around poker, chess, basketball, and politics fit this category.

I was attracted to Silver’s assertion that there are uncertainties in prediction, and there always will be. There is no way to have access to all pertinent data relevant to a prediction.  Nor is it possible to un-biasedly and correctly analyze all available pertinent data. In part, this is because it is difficult to correctly discern which data is relevant. Otherwise stated: it is difficult to tell what is noise and what is signal. Silver asserts that it is important to accurately represent uncertainties, even when it makes the prediction less useful. For example, rather than stating that population growth will be P% in 30 years, it would be better to state that “pending X and Y conditions, if Z holds steady, population growth is projected to be between P% and Q% in 30 years.”

Silver also cites a Donald Rumsfeld quote: “…there are known knowns; there are things we know that we know. There are known unknowns; that is to say, there are things that we now know we don’t know. But there are also unknown unknowns – there are things we do not know we don’t know.” “Unknown unknowns” present the greatest chance for introducing danger or inaccuracy into predictions. In representing uncertainty accurately, it is important to take this under consideration. To combat unknown unknowns, Silver suggests absorbing as much data as possible by reading avidly. The more prediction-makers know, and the more they know that they don’t know, the more accurate they will be. The book asserts that most predictions go wrong due to human error, and the more data prediction-makers collect, the more human error is reduced.

When considering more data, though, there is potential to get caught up in “noise.” Rather than take “more data” as a net positive at face value, I believe there are criteria that data should meet before having equal consideration. Silver does not address this extensively in his book. The basic premise of the book is that there is so much data out there that it is easy to get stuck in the weeds, and so if we are supposed to absorb everything we can, without extensively filtering it, we are likely to become overwhelmed and confused. While I believe Silver understands this, it is not addressed and the basic idea of “get as much data as possible” is expressed throughout the book.

The concepts above were the ones that most interested me, although other gems are strewn throughout the book. Silver touches on betting strategy (always go for it if the likelihood greater than projected), poker strategy (learn the behavior of your opponents), and the relationship between extreme and non-extreme instances of events. After listening to the book, I feel much better informed about prediction strategy and am planning to incorporate representation of uncertainty into my predictions from here on out.

Advertisements