APLawrence.com -  Resources for Unix and Linux Systems, Bloggers and the self-employed

A search engine list protocol

© November 2005 Anthony Lawrence

November 2005

No matter what you do for search engine optimization, if the search engines never come a-crawling, it can't help you. Ideally, of course, all of your pages would be referenced in at least one index page, and the search engines would find that and work their way through your site from that. That's a clumsy method though, so both Google and Yahoo have ways for you to tell them about your pages more directly.

For Yahoo, it's just a simple list of files. For this site, a partial look at that is:

 .. etc.

You can leave it as a plain text file or compress it with gzip. Submit it to Yahoo at https://submit.search.yahoo.com/free/request.

Google has its much more complex Site Map File. This contains much more information; here's a small section of it from here:

 <?xml version="1.0" encoding="UTF-8"?>
 <urlset xmlns="https://www.google.com/schemas/sitemap/0.84">

Google provides a page where you can resubmit site maps and where they show you any errors such as pages you listed that don't actually exist. The same page also shows any errors from ordinary site crawling. Yahoo has nothing like that.

Google automatically returns and picks up updated sitemaps; you don't seem to need to tell them you've updated though I suppose it can't hurt. Yahoo, on the other hand, doesn't seem to do this without being told. In my opinion, that's just one of the reasons why Google is a better search engine than Yahoo.

It would be nice if we had a open protocol that all the search engines could use. I'd say Google's site maps is a good starting point, though I think that could use some extensions such as webmaster suggested keywords. Nothing stops other search engines from looking for sitemaps, but there's nothing to tell them where to look, so we also need something in the headers of our pages to help them find it. We do that for rss files now; all of my pages include something like this:

 <link rel="alternate" type="application/rss+xml" href="https://aplawrence.com/aplawrence.rss"
 title="RSS feed for aplawrence.com"/>

The same thing could be done for sitemaps, which would let all search engines get concise and (hopefully) accurate information about your pages.

Microsoft Bing also accepts sitemaps.

See Getting your pages indexed and Google Sitemaps also.

Got something to add? Send me email.

(OLDER)    <- More Stuff -> (NEWER)    (NEWEST)   

Printer Friendly Version

-> A search engine list protocol

Inexpensive and informative Apple related e-books:

iOS 8: A Take Control Crash Course

El Capitan: A Take Control Crash Course

Take control of Apple TV, Second Edition

Take Control of iCloud, Fifth Edition

Sierra: A Take Control Crash Course

More Articles by © Anthony Lawrence

Printer Friendly Version

Have you tried Searching this site?

This is a Unix/Linux resource website. It contains technical articles about Unix, Linux and general computing related subjects, opinion, news, help files, how-to's, tutorials and more.

Contact us

Printer Friendly Version

I love deadlines. I love the whooshing noise they make as they go by. (Douglas Adams)

Linux posts

Troubleshooting posts

This post tagged:



Unix/Linux Consultants

Skills Tests

Unix/Linux Book Reviews

My Unix/Linux Troubleshooting Book

This site runs on Linode