Although there are many more things you should learn about robots.txt files , besides reviewing Google’s own documentation , they are essentially one of the most important tools we have to allow or limit the crawling of our pages.
In other words, it is a file that can
Prohibit Googlebot or any other search engine with its spider (identifi by robots.txt as User-agent ) from crawling some pages (or even all of them) .
The robots.txt is a simple text file that we must upload to the root of our domain. For example:
yourdomain/robots.txt
In it we establish a series of tracking prohibition rules (Disallow), or permission (Allow) for each.
User-agent or, for all with a *, such that:
We are fac with a page with sensitive information that we have not necessarily block by login.
Se have areas of our site with duplicate content that we are not interest in rirecting, but we are not interest in detecting either.
We have just launch a site and we want the vk database bot to focus on certain pages and we will later unblock the ones we are interest in entering its crawl queue.
What problems could robots.txt present?
As with everything, the use of robots.txt can also be risky if appli incorrectly:
We must be careful not to use the “Disallow: /” formula, which would prevent the indexing of our entire site, which is why we classify it as a dangerous practice.
This is not a good deindexing tool, it only blocks what are the benefits of creating a digital marketing budget? spiders. This means that even if a URL has been block by robots, it may still appear in the SERPs days later.
It does not prevent your
URL from being index if it has external links, so it snbd host may still appear in the SERPs with a description such as: “There is no information available about this page.”