Robots.txt

A text file used to instruct web crawlers on which pages or sections of a website should or should not be crawled and indexed.

SEO

Definition

Robots.txt is a plain text file placed in the root directory of a website that communicates rules to search engine crawlers and bots about which parts of the site can be crawled or indexed. Based on the Robots Exclusion Standard, it serves as a guideline for compliant crawlers, helping site owners control how bots interact with their content.

Through robots.txt, webmasters can create directives for specific user agents (crawlers), block access to certain directories or files, specify crawl delays to manage server load, and declare XML sitemap locations. While most reputable crawlers follow these rules, robots.txt is not enforceable—malicious bots can still ignore the file.

For AI-powered search and GEO optimization, proper robots.txt management is crucial. It ensures AI crawlers access only relevant, high-value content while excluding duplicate, private, or low-quality sections that could weaken perceived authority. Correctly configured rules help AI and search engines interpret and prioritize authoritative content for indexing and citation.

Common robots.txt directives include:

  • User-agent (specifying which crawler the rules apply to)
  • Disallow / Allow (blocking or granting access to specific paths)
  • Sitemap (pointing to the location of XML sitemaps)
  • Crawl-delay (throttling how often bots request pages)

Best practices include keeping the file simple, avoiding the accidental blocking of CSS/JavaScript critical for rendering, testing rules before deployment, and reviewing the file regularly as site structures evolve. The file must be accessible at domain.com/robots.txt and properly formatted for crawler compliance.

Examples of Robots.txt

1 An e-commerce store preventing bots from crawling checkout, user account, and duplicate filtered product pages.

2 A corporate website disallowing crawlers from admin and staging directories while allowing access to public-facing service pages.

3 A news publisher blocking print-friendly versions of articles while ensuring canonical pages remain crawlable.

4 A blog restricting crawlers from indexing tag pages and archives to avoid duplicate content problems.

Frequently Asked Questions about Robots.txt

Block low-value or private sections such as admin pages, duplicate filtered URLs, or staging areas. Avoid blocking essential assets like CSS and JavaScript needed for page rendering.

Get recommendations to boost your AI search ranking

Join the waitlist for early access to real-time brand tracking across top AI answer engines. Stop guessing and start shaping the AI narrative.