Internet Marketing - Matt Bailey [168]
Just Tell Me What I Need to Know!
The best way to explain the robots.txt file is that it is a “welcome mat” for the search engines (Figure 14-15). It’s not so much that the file is necessary for search engine success, but it’s one of those hundreds of small things that you need to consider, much like everything in SEO. If you have it, it will help your search engine success in a very small way. If you don’t have it, it won’t harm you; it’s simply a technical issue. The downside of not providing a robots.txt file is that you are relinquishing some control to the search engines and leaving some issues to chance. I don’t like that.
©iStockphoto.com/[sjlocke]
If you do have it, you might have seen a slight improvement in search engine rankings. Most experience minor improvements, but the evidence is mostly anecdotal. You won’t get penalized for not having this file but are able to manage the site more closely and get some reward (even if it is a slight boost) from adding this file.
The search engines request the robots.txt file before or during every spidering session. Some request it before every session, and some request prior to groups of pages. Either way, search engines request this file multiple times in a session and in a day. If the file does not exist, then it shows up as a “page not found” error in your web server log files. So, if the search engines request it, it must be necessary to their purposes. That’s why I believe it is important to have—it is a way of maintaining more control over your website. You can see whether you have a robots.txt file or what is in it by typing into your browser www.yourdomain.com/robots.txt (yourdomain is the domain of your website). I like to explain it as a “welcome mat” because some people have a welcome mat at the entrance of their house and some people don’t. Either way, it doesn’t prevent people from coming into the house. It’s the same for the robots.txt file; it simply tells search engines that they are welcome to visit the site.
If you want to get fancy, though, you can tell the search engine where not to go on your site. Typically, these are files that are not important to the search engines or files that you don’t want showing up in the search results. It’s kind of like that closet where you store all your junk when you quickly clean the house. When people come over, you don’t want them to use that closet. It’s not vital for them to know what is in there, because it’s stuff you want stored out of sight.
For a website, some people “disallow” printer-friendly pages, images, duplicated directories, and admin pages that they do not want to show up in the search results. Now, I am not saying to use this as a way of protecting information that you don’t want people to see. If that is the case, then you need to put that behind a password. The robots.txt file is not to hide information from people. It simply to tell the search engines not to index the content.
Knowing this is really what’s important from a marketing standpoint; the technical standpoint is a little more difficult, because it gets into server commands, which most people don’t understand. Frankly, I’m surprised how many times I run into problems with the robots.txt as the culprit. This little file has been the cause of a lot of problems for some very large websites.
However, as promised, here is what you need to do if you just want this done, without all of the technical background.
Open a text editor, such as Notepad or equivalent program, and type this in:
User-agent: *
Disallow:
Now, save the file with the name robots