What is robots.txt?
The robots.txt file is a text file at the root of a domain (e.g., example.com/robots.txt) that gives crawlers instructions about which URLs they can and cannot access. It uses the Robots Exclusion Protocol standard.
How robots.txt works
robots.txt controls crawl access, not indexation. Disallowing a URL prevents Google from crawling it, but if the URL is linked to from other pages, Google may still index it based on those external signals — it just will not have read the page content.
Common legitimate uses of robots.txt include blocking /admin/, /staging/, /api/ endpoints, duplicate parameter-based URLs, and internal search results. A poorly configured robots.txt that blocks CSS or JavaScript files can prevent Google from rendering your pages correctly.
Example
Example
Grow With Gradient's robots.txt intentionally allows AI training bots (no Disallow for GPTBot, ClaudeBot, etc.) as part of an AEO visibility strategy. Most sites block these by default.
Frequently asked questions
Does blocking a URL in robots.txt remove it from Google?
No. It blocks crawling, not indexing. A blocked URL with external links can still be indexed with no snippet. To remove a page from results, allow crawling and apply a noindex directive.
Should I block AI crawlers in robots.txt?
It is a strategic choice. Blocking GPTBot or ClaudeBot protects content from training but removes you from AI answers. Brands competing on AI visibility increasingly allow them deliberately.