The essential weakness at the core of Google search

note-HXA7241-20100122T2023

Harrison Ainsworth

Suppose you write an article for your web site, and it is actually quite good. But it won't be read because it won't have a high ranking and so be noticeable in searches. But it won't have a high ranking because no-one has read it. There is a circular dependency. The only way a new item can find its level is by a means outside the system.

In a fundamental sense Google is not search: it is a summary of public opinion. PageRank is the public's rating of web pages. When you search, you effectively ask the public's opinion. But you are the public, so you are trying to answer your own question. The public goes to find information, yet that information must come from itself. Being composed of multiple different contributions does not resolve this, because its general value depends on the commonality of that multitude.

Google search does not provide the information it appears to, it only shares what already exists.

If we widen the consideration and say there is no other system overall than public opinion (and it is the best anyway), the system can never really work properly. It tells us only what we already know (or what we think we know).

The same essential problem is shared by user-moderated comment systems such as news.ycombinator.com and reddit.com. As far as comment-scores are used to filter one's reading, there is an inherent contradiction in the system:

  • the reader wants some filtering, so they can just read the good stuff
  • the filtering is done by the readers, which requires they read more than just the good stuff

So the filtering can never be entirely effective: if you show everyone only the good stuff, the filtering would not get done at all. The work has to be done by someone. And this work is input; the system provides no significant information itself.

This whole information strategy is an automated system of ‘social-proof’. SEOs and marketers therefore have a view that is actually as correct as the content-focused – the system really is substantially defined and driven simply by who is at the top, as much as by who makes good content.

One treatment – not cure – could be to add randomness. Then all content gets some exposure/evaluation unbiasedly, and doing so randomly is impervious to manipulation. It must be sufficient randomness to weaken the social-proof effect: people must feel they cannot completely trust the rankings.

Background: