What is a Robots.txt?

A file that tells search engines which pages they can or cannot crawl.

Understanding Robots.txt

Robots.txt is a text file placed in your website's root directory that instructs search engine crawlers which pages or sections they should or shouldn't access. It uses the Robots Exclusion Protocol to manage crawler behavior and protect resources from being crawled.

Common robots.txt uses: blocking admin areas, preventing crawl of duplicate or thin content, protecting staging environments, managing crawl budget by blocking unimportant pages, and blocking parameter-based duplicates. Important: robots.txt blocks crawling, not indexing—blocked pages can still appear in search results if they have external links. For true removal from search, use noindex meta tags instead. Robots.txt errors can accidentally block important content, so test thoroughly using Google Search Console's robots.txt tester.

Frequently Asked Questions

What is a Robots.txt?

A file that tells search engines which pages they can or cannot crawl.

Robots.txt is a text file placed in your website's root directory that instructs search engine crawlers which pages or sections they should or shouldn't access. It uses the Robots Exclusion Protocol to manage crawler behavior and protect resources from being crawled.

Why is Robots.txt important?

Robots.txt is a powerful tool for managing how search engines interact with your site—but also a dangerous one if misconfigured. A single character error can block your entire site from being crawled. Proper robots.txt management improves crawl efficiency by directing bots away from low-value pages and toward important content, which is especially critical for large sites with limited crawl budget.

Back to Glossary
Get a Quote →