Google’s John Mueller talked about robots.txt in a recent webmaster hangout. Here are two things that you should know about the robots.txt file on your website.
1. Your subdomains need their own robots.txt files
If you want to block directories and files on a subdomain, you have to add a robots.txt file to that subdomain. The robots.txt file of the main domain will not be used:
“Robots.txt is per hostname and protocol. If we’re looking at a web page that’s hosted on say www.example.com and that includes content from a different subdomain, or from a different domain, then we would use the primary robots.txt file on www.example.com for that page. […]
We check for that subdomain, for that hostname, whether we’re allowed to crawl it, so blocking something on the www version would not block it from being crawled from a different hostname or different subdomain.”
2. Your robots.txt file should not have a 503 HTTP status code
If your robots.txt file has a temporary 503 HTTP status code, Google won’t index any pages on your website.
“503 is a temporary error that basically tells us we should try again later [..] By default, when we see a server error we say we don’t know what the robots.txt file is so therefore we will not crawl anything from this hostname.”
If the robots.txt file has a permanent 503 HTTP status code, Google thinks that this is an error and pages will be indexed.
“Sometimes we see that these server errors are more like a permanent thing […] If we see the 500 or 503 error we stop crawling completely and then after a certain period of time, I don’t know, maybe a couple of months or so, […] we think ‘well, this is a permanent error so […] we try to see what we crawl. […]
If there’s some technical reason that you really need to stop crawling of your website then you can return 503 for your robots.txt file and [Google ]will stop crawling as soon as we reprocess that, which is usually within a day so.”
You can view the full webmaster hangout here:
How to check your robots.txt file
Use the website audit tool in SEOprofiler to check the robots.txt file of your website. The website audit tool also checks all pages of your website and it shows you want to can to do ensure that all pages get listed correctly: