Archive for March, 2009

The Numerati and The Web Genome Project: Kindred Concepts

Tuesday, March 31st, 2009

I’m about halfway through Stephen Baker’s book The Numerati, and I get more excited with every page. It’s as if Baker had written a treatise on The Web Genome Project and what we’re all about — including making the case for a prediction model that doesn’t rely on historical data.

From the introduction:

The exploding world of data, as we’ll see, is a giant laboratory of human behavior. It’s a test bed for the social sciences, for economic behavior and psychology. Researchers at companies such as Microsoft and Yahoo are busy hiring scientists from fields as diverse as medicine and linguistics to help them grapple with the bits of our lives that are pouring in. These streams of digital data don’t recognize ancient boundaries. They’re defined by algorithms, not disciplines. They can easily cross-fertilize. This means that psychologists, economists, biologists, and computer scientists can collaborate as never before, all of them sifting for answers through countless details of our lives. Jack Einhorn, the chief scientist at a New York media start-up called Inform Technologies, predicts that the great discoveries of the twenty-first century will come from finding patterns in vast archives of data. “The next Jonas Salk will be a mathematician,” he says, “not a doctor.”

Baker goes on to explore the many ways in which people are being modeled and mapped, and in which mathematics are being used to predict human behavior. So far, though, the scenarios all fall under what we’d now think of as ‘traditional’ behavioral modeling: look at what you’ve done, and use it to predict what you’d do. In some cases, the connection may be correlative rather than causative (the example he gives is that romantic-movie watchers are more inclined to click on ads for car rentals), but the net result is the same.

…math-based predictions rely on patterns of past behavior. Let’s say I fly to Taiwan tomorrow and purchase 200 Michelin tires with my credit card. Within minutes, MasterCard will be calling my house in New Jersey, asking if that’s really me on an Asian spree. My buying patterns and those of card thieves are etched into their system.

These models obviously have their place, but they have some limitations. I don’t know anyone who complains when a credit card company monitors our activity with them and throws up a flag when there’s something abnormal. On the other hand, I don’t know anyone who would be happy if their credit card activity were given to other companies in order to better target sales offers.

Once our data is out in the world, its uses and movements become largely disconnected from us and our ability to grant permission. Like derivative mortgages, the data takes on a life of its own, independent from the individual who generated it — and there’s something about that that just doesn’t sit well with most of us.

So, yes, mathematical modeling is where we are and where we’re going. Mathematical models that predict behavior without tracking individual histories? Even better.

Have you read The Numerati? What did you think of it? And do you see the connection with the Web Genome Project as well?

A talk by Hal Varian, Google’s Chief Economist

Friday, March 20th, 2009

Professor Hal Varian, Chief Economist at Google

Professor Hal Varian, Chief Economist at Google

I had the privilege this week of attending a lecture by Professor Hal Varian, Chief Economist for Google. Varian discussed the advent of computer-mediated transactions and how they transform our business practices.

There were a couple of interesting points he raised: historical (in a pre-literate and pre-numerate era, how could people shipping barrels of olive oil have any confidence that the amount of oil that left was the same amount that arrived?), logistical (computer-mediated transactions enable more and more complex contractual arrangements), and conceptual (behavioral targeting, etc.).

This last, conceptual, is a big thing for Google these days, since they’ve been in the behavioral targeting business for all of two weeks. It’s also where Varian started to get into Web Genome Project territory. I found one thing he said particularly interesting:

In general, people have no problems with the intended use of data (more relevant content, etc.). What people are worried about is the unintended use of data (AOL’s massive data spill, etc.). The problem, therefore, is not so much a privacy problem, but rather a security problem.

That’s a pretty interesting comment, and it certainly rings true to me. “I don’t want Google knowing all this stuff about me,” people say. “Who knows what they’re going to do with it? What if somebody unscrupulous gets their hands on it?”

The core proposition of the Web Genome Project is personalisation with privacy. In light of Varian’s comments, however, it’s worth revisiting that proposition, because in fact it’s much stronger than that. The WGP model means that no clickstream or historical data is ever collected in the first place. If a thief were to break in, the vault would be empty; there’s just nothing there. So the model actually eliminates the entire question of privacy. It doesn’t much matter whether I can keep your data private if I don’t have any data on you to begin with.

Gratifying stuff from someone who’s earned his stripes. What are your thoughts about privacy vs. security?

Official Launch of the Web Genome Project

Tuesday, March 10th, 2009

Today is our official launch of the Web Genome Project, complete with press release distribution. I’ve included the release below; if you know anyone who might be interested in what we’re doing, by all means feel free to pass it along. Thanks for visiting!

Web Genome Project Launches Movement to Map the Internet

VortexDNA, Christchurch, NZ March 11, 2009. The Web Genome Project (WGP), designed to revolutionize the way we understand and interact with the Internet, launched today with an interactive search engine at www.webgenomeproject.org.

The WGP allows each individual a totally private way to find personally relevant content on the Web.

Each page on the Web has a distinct personality and flavor — as does each person who surfs the Web. The WGP dynamically and continuously calculates a numerical profile — a ‘genome’ — for web pages, based on the aggregate genomes of their visitors.

Visitors to webgenomeproject.org can use the tool to compare search results to a ‘filter genome’. They can adjust the filter to see how different genomes affect the order of search results, and they can also create their own genomes.

The Web Genome Project has been well received in the search industry. Charles Knight, Editor of the popular blog AltSearchEngines, said, “I downloaded the extension and gave it a spin… the WGP was spot on – and then some!” Mark Cramer, the CEO of SurfCanyon, shared Knight’s sentiments, saying, “I like it… I can see this becoming viral.”

As genomes get generated for more and more pages, they create a virtual topography of the Web. Individuals can use this topography to find sites that share their genomes.

Anyone can contribute to this virtual topography by installing the WGP extension, completing a short survey to create an initial genome, and then using the Web the way they normally do.

Individual genomes are based on a predictive algorithm from VortexDNA. They’re not personally identifying in any way, are not unique to the user, and don’t contain any demographic or historical information.

“There are more than 108 million websites on the World Wide Web,” says Branton Kenton-Dau, VortexDNA’s CEO. “The WGP is an attempt to make sense of it all, so everyone can enjoy the Internet more without being followed around online or having their clickstreams tracked.”

The WGP’s stated goal is to generate genomes for ten million web pages. So far more than half a million pages have associated genomes.

- END -

ABOUT THE WEB GENOME PROJECT
The Web Genome Project is a global movement to map the Web and make sense of its billions of pages. Its aim to give us the ability to tune into the content we’re most interested in at any given time.

ABOUT VortexDNA
VortexDNA offers a unique system for profiling users without retaining personal information, and the ability to map and codify that profile. Its predictive modeling algorithm has applications for online services, insurance, and health care.

An all-organic nation: Don’t be afraid to dream big

Monday, March 9th, 2009

I’ve got to share with you a presentation from the CEO of VortexDNA, Branton Kenton-Dau. He made the presentation at a neat NZ conference called The Big Think: 7 people, 7 ideas, 7 minutes each. Here’s Branton’s:

Every good presentation has at least one ‘moment’: the audience draws in its collective breath, the crowd is in alignment, a million hairs on a hundred necks stand up. It seems pretty clear that the ‘moment’ in Branton’s presentation comes when he proposes that New Zealand, as a nation, become 100% organic.

Why such a reaction? Two reasons:

  1. The totality of the idea is clearly conveyed in only two words: “100% organic”.
  2. The idea taps into everything New Zealanders already hold dear: nuclear-free and GE-free. Cheeky upstarts and fearless innovators. Clean and green. Who better to be the first organic country?

So hats off to you, Branton. And a challenge to all of you out there: what grand vision are you contributing to? What grand vision are you inspiring? How are you helping create the world of your dreams?

Welcome to the Web Genome Project

Monday, March 2nd, 2009

Welcome to the Web Genome Project: a global movement to map the Web. We’re delighted you’re here.

There are billions of pages online, and they are all fighting for our attention. The WGP takes all that noise and makes sense of it, giving us the ability to tune into the part we’re most interested in at any given time.

The WGP is a community project: it is by us and for us. Your participation (yes, you!) is what will make it great, and all of our participation is what will make it truly meaningful.

We invite you to join the movement.

Before you do anything else, give it a try: go to the WGP home page, run a search, and play with the genome sliders to see how the results change.

Isn’t that cool?

There are a few things you’re probably wanting to know about the WGP:

How does it work?

The WGP works by generating a numeric profile (we call them ‘genomes’) for each page and each person on the Web. These genomes are not unique and they don’t tell us any specific data about what they’re attached to, but they do allow us to do a bit of online ‘matchmaking’: matching people with pages.

How does a page get a genome?

The genome of a page is the aggregate genome of everyone who’s visited it. We don’t have any information about the individuals who have been there, but the behavior of the system gives us some pretty amazing insights.

How do I get a genome?

The way you get a genome is similar to the way a page gets a genome. First, you download the MyWebDNA browser extension, and answer a short survey to generate your initial genome. Then, as you browse the Web, the genomes of all the pages you visit get aggregated into your genome. Again, we have no information about which pages you’ve been to, but the aggregate number allows us to help you find new sites you might like.

The Web Genome Project team will be providing more updates soon, but, for now, look around. Have a play with the search function to see how different genomes generate different search results. Then come back here and let us know what you think.

We look forward to hearing from you, and we look forward to mapping the Web with you.