Robot Files, A Primer

Robots Files

Robot files are txt documents that reside on the root of a domain. These text files give bots direction on what to crawl and what not to crawl. The file can target specific or all bots to target in its request. “Do not crawl” directives can be specific to the URL or folder level, including the entire web folder.

Additionally, robots documents can be used to indicate the location of the sitemap document.

A robots file is not a required element, but is useful in having a single resource for disallowing crawled pages/folders. An equivalent of the robots file can be seen in the on-page robots tag, which can also call out specific bots or all bots and provide them a noindex directive. The difference is, the bot will have to visit the page before seeing the noindex rule. With a robots file, the bots will not attempt to visit the URL provided.

It should be noted, that an improperly set up robots file can lead to a no-indexing of the entire site if not used properly. Improperly implemented, robot files can accidentally disallow entire files (including from the root domain), thus rendering the entire site non-indexable by bots.

Furthermore, bots do not have to follow a robots.txt file. Unscrupulous bot programs can ignore a robots file and crawl and index the site at any time they like.

Note: The robots file must be included in the root (www.example.com/robots.txt) for it to be read by bots.

Find more information about robots from Google.