The Indexable Web is more than 11.5 billion pages (2009). And we need to add one more, because I'm about to post this.
One interesting thing in this report is the estimate that Google, MSN, Yahoo and Teoma only agree on a little more than a quarter of those pages, which means that there are a lot of pages you wouldn't find at all unless you checked all four.
As you'd expect, Google is thought to have the most coverage, and poor MSN is a distant third behind Yahoo. Microsoft is probably trying to do this with their own operating systems, and maybe they just can't keep up. As noted at Nielsen//NetRatings, Microsoft search is losing ground with users; lack of pages and accuracy would be good reasons for that, though I still think part of it is that a lot of people just don't trust Microsoft not to mess with the results to favor themselves.
Eleven billion plus pages is a lot of html, isn't it? A lot of that is unimportant, transitory, or otherwise quite ignorable, but no matter what you discount, there's a lot of real content left.
I haven't read much of it - have you?
Got something to add? Send me email.
More Articles by Tony Lawrence © 2009-11-07 Tony Lawrence
If you just want to use the system, instead of hacking on its internals, you don't need source code. (Andrew S. Tanenbaum)