What is Robots.txt?
Definition
Robots.txt is a text file at a website's root directory that instructs search engine crawlers and other bots about which URLs they are allowed or disallowed from accessing.
Why robots.txt matters
Robots.txt matters because it gives website owners control over crawler behavior. You can prevent bots from accessing private areas, admin pages, duplicate content, or resource-intensive sections that don't need indexing.
Proper robots.txt configuration helps manage crawl budget. By blocking unimportant pages, you ensure search engines spend their limited crawl resources on your valuable content rather than utility pages.
Robots.txt also prevents accidental indexing of development environments, staging sites, or internal tools that shouldn't appear in search results—an important safeguard for many organizations.
Key concepts and types
- •User-agent directive
Specifying which crawlers the rules apply to, from all bots to specific ones like Googlebot. - •Disallow directive
Blocking access to specific URLs or directories from crawlers. - •Allow directive
Explicitly permitting access to URLs within a disallowed directory. - •Sitemap declaration
Including the sitemap URL to help crawlers discover content. - •Crawl-delay directive
Requesting crawlers wait between requests (not honored by all bots).
Common misconceptions
- ✕Robots.txt prevents pages from appearing in search results
- ✕Robots.txt provides security for private content
- ✕All bots respect robots.txt directives
- ✕Blocking a page removes it from the index
- ✕Robots.txt is required for every website
Related terms
FAQs
Does robots.txt hide pages from search engines?
No. Robots.txt blocks crawling, not indexing. If other sites link to a disallowed page, it can still appear in search results (with limited information). Use noindex meta tags to prevent indexing.
Is robots.txt secure for private content?
No. Robots.txt is publicly readable and only requests compliance—it doesn't enforce access control. Malicious bots ignore it. Use authentication for truly private content.
Where should robots.txt be placed?
At the root of your domain: example.com/robots.txt. Subdirectories or subdomains need their own files. The file must be accessible and properly formatted.