The Numerati and The Web Genome Project: Kindred Concepts
Tuesday, March 31st, 2009I’m about halfway through Stephen Baker’s book The Numerati, and I get more excited with every page. It’s as if Baker had written a treatise on The Web Genome Project and what we’re all about — including making the case for a prediction model that doesn’t rely on historical data.
From the introduction:
The exploding world of data, as we’ll see, is a giant laboratory of human behavior. It’s a test bed for the social sciences, for economic behavior and psychology. Researchers at companies such as Microsoft and Yahoo are busy hiring scientists from fields as diverse as medicine and linguistics to help them grapple with the bits of our lives that are pouring in. These streams of digital data don’t recognize ancient boundaries. They’re defined by algorithms, not disciplines. They can easily cross-fertilize. This means that psychologists, economists, biologists, and computer scientists can collaborate as never before, all of them sifting for answers through countless details of our lives. Jack Einhorn, the chief scientist at a New York media start-up called Inform Technologies, predicts that the great discoveries of the twenty-first century will come from finding patterns in vast archives of data. “The next Jonas Salk will be a mathematician,” he says, “not a doctor.”
Baker goes on to explore the many ways in which people are being modeled and mapped, and in which mathematics are being used to predict human behavior. So far, though, the scenarios all fall under what we’d now think of as ‘traditional’ behavioral modeling: look at what you’ve done, and use it to predict what you’d do. In some cases, the connection may be correlative rather than causative (the example he gives is that romantic-movie watchers are more inclined to click on ads for car rentals), but the net result is the same.
…math-based predictions rely on patterns of past behavior. Let’s say I fly to Taiwan tomorrow and purchase 200 Michelin tires with my credit card. Within minutes, MasterCard will be calling my house in New Jersey, asking if that’s really me on an Asian spree. My buying patterns and those of card thieves are etched into their system.
These models obviously have their place, but they have some limitations. I don’t know anyone who complains when a credit card company monitors our activity with them and throws up a flag when there’s something abnormal. On the other hand, I don’t know anyone who would be happy if their credit card activity were given to other companies in order to better target sales offers.
Once our data is out in the world, its uses and movements become largely disconnected from us and our ability to grant permission. Like derivative mortgages, the data takes on a life of its own, independent from the individual who generated it — and there’s something about that that just doesn’t sit well with most of us.
So, yes, mathematical modeling is where we are and where we’re going. Mathematical models that predict behavior without tracking individual histories? Even better.
Have you read The Numerati? What did you think of it? And do you see the connection with the Web Genome Project as well?










