What Is “robots.txt”?
It is a file placed in the root directory of a website that provides some instructions for a crawler/spider (Bot) about which files and/or directories are not to be indexed. It can be used to instruct a specific Bot or all Bots. Some people prefer to disallow Google’s image Bot (to reduce bandwidth consumption).
Recently, on a forum I frequent, there has been some confusion (by one or two individuals) about what the purpose of robots.txt is.
The Three Myths
#1: "use robots.txt in order to hopefully rank well in the search engines."
#2: "The 404 is the robots hitting your site asking for your robots.txt file since they don’t find one they probably end up leaving."
#3: "Install one and all pages are usually indexed quickly and fully."
Dispelling Of The Three Myths
#1: robots.txt has nothing to do with getting a good rank. Hard work in making good content that people will naturally link to will help in getting a good rank. But that’s just one of many ways. Search
google,
Yahoo! or
MSN for "SEO"/"Search Engine Optimisation" (and also try the American spelling too, Optimization) and "how to get a good rank" for more information.
#2: robots.txt is by no means a standard. It doesn’t matter if it exists or not. If it did then many major sites indexed by the "Big 3" wouldn’t be indexed whatsoever. Your site will be indexed regardless of its existence, as long as the robots.txt file is correct. Some bots might even ignore the file altogether.
#3: It’s not going to make a difference if the file doesn’t exist. On the other hand,
Google Sitemaps service might help in getting indexed quicker and more fully. I say might because a sitemap is only providing additional information about a site. A quote from
About Google Sitemaps:
"Google still searches and indexes your sites the same way it has done in the past whether or not you use this program. A Sitemap simply gives Google additional information that we may not otherwise discover."
Finally, for those of you who have written a robots.txt file for your site, I would suggest using a robots.txt validator every time changes are made to ensure there are no errors due to a typing mistake or syntactical error. There are many validators out there. Search for "robots.txt validator" and use the one you prefer.
Related posts:
- SEO Mistakes
- Page Rank Update
- Supplemental Results
- Is My Web Directory Search Engine Friendly?
- Google’s Webmaster Tools Updated
Published in:
Internet