Create custom robots.txt files for your website. Control how search engines crawl your site with easy-to-use presets and custom rules.
* matches all user-agents/ matches the entire site/folder/ matches a specific folder*.pdf matches all PDF files$ matches end of URLrobots.txt to your website's root directoryyourdomain.com/robots.txtThe robots.txt file is a simple text file placed in your website's root directory that communicates with web crawlers and bots. Following the Robots Exclusion Protocol (REP), first introduced in 1994, it tells search engines which pages they can and cannot access. While it's a powerful tool for managing crawler behavior, it's important to understand both its capabilities and limitations.
When a search engine bot visits your site, the first thing it does is look for yourdomain.com/robots.txt. Based on the instructions it finds, the bot decides which pages to crawl or skip. This happens before any actual crawling takes place, making robots.txt the gatekeeper of your website's crawlability.
Specify which bots can crawl which parts of your site
Direct crawlers to important pages, skip low-value content
Keep admin areas, staging, and private sections from crawlers
Point crawlers to your XML sitemap for better indexing
The robots.txt file uses a simple syntax with specific directives. Understanding each directive is essential for proper configuration:
User-agentSpecifies which crawler the following rules apply to.
# Apply to all crawlers
User-agent: *
# Apply only to Googlebot
User-agent: Googlebot
# Apply only to Bing
User-agent: Bingbot
Common user agents: Googlebot, Bingbot, Slurp (Yahoo), DuckDuckBot, Baiduspider, Yandex, facebot, Twitterbot
DisallowTells crawlers NOT to access specific paths.
# Block a specific page
Disallow: /private-page.html
# Block an entire directory
Disallow: /admin/
# Block all pages with query string
Disallow: /*?
# Block everything (entire site)
Disallow: /
Note: An empty Disallow: means nothing is blocked.
AllowPermits access to specific paths, overriding Disallow rules. Useful for exceptions.
# Block /private/ but allow one page
User-agent: *
Disallow: /private/
Allow: /private/public-page.html
# Block all PDFs except one
Disallow: /*.pdf$
Allow: /docs/whitepaper.pdf
Note: Allow is supported by Google and Bing but not all crawlers.
SitemapPoints crawlers to your XML sitemap(s) for better discovery of pages.
# Single sitemap
Sitemap: https://example.com/sitemap.xml
# Multiple sitemaps
Sitemap: https://example.com/sitemap-posts.xml
Sitemap: https://example.com/sitemap-pages.xml
Sitemap: https://example.com/sitemap-products.xml
Best practice: Always use absolute URLs with https://
| Directive | Purpose | Example |
|---|---|---|
Crawl-delay |
Sets seconds between requests (Bing, Yandex; ignored by Google) | Crawl-delay: 10 |
* (wildcard) |
Matches any sequence of characters in paths | Disallow: /category/*/page |
$ (end match) |
Pattern must match the end of URL | Disallow: /*.pdf$ |
# (comment) |
Adds notes or explanations (ignored by bots) | # This blocks admin section |
User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
Disallow: /wp-includes/
Disallow: /wp-content/plugins/
Disallow: /wp-content/cache/
Disallow: /trackback/
Disallow: /feed/
Disallow: /?s=
Disallow: /*?*
Sitemap: https://example.com/sitemap_index.xml
User-agent: *
Disallow: /cart
Disallow: /checkout
Disallow: /my-account
Disallow: /wishlist
Disallow: /orders/
Disallow: /search?
Disallow: /*?sort=
Disallow: /*?filter=
Allow: /
Sitemap: https://example.com/sitemap.xml
Never use robots.txt as a security measure! The file is publicly accessible at yoursite.com/robots.txt. Blocking a path actually reveals that something exists there. For private content, use proper authentication or keep it off the server entirely.
| Method | What It Does | Best For |
|---|---|---|
| Robots.txt | Blocks crawling (bot never visits page) | Saving crawl budget, blocking entire sections |
| Meta Robots Tag | Blocks indexing (page crawled but not indexed) | Removing specific pages from search results |
| X-Robots-Tag Header | Same as meta robots but via HTTP header | PDFs, images, and non-HTML files |
| Canonical Tag | Indicates preferred URL version | Duplicate content, URL parameters |
Disallow: / blocks everything!/admin vs /admin/ behave differentlyWith AI systems like ChatGPT and Claude, many website owners want to control whether AI companies can use their content for training:
| Company | User Agent | Purpose |
|---|---|---|
| OpenAI | GPTBot |
Crawls for ChatGPT training data |
| OpenAI | ChatGPT-User |
Real-time browsing for ChatGPT users |
Google-Extended |
Crawls for Gemini training | |
| Anthropic | anthropic-ai |
Crawls for Claude training data |
| Common Crawl | CCBot |
Open dataset used by many AI companies |
# Block AI training crawlers
User-agent: GPTBot
Disallow: /
User-agent: ChatGPT-User
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: anthropic-ai
Disallow: /
User-agent: CCBot
Disallow: /
Use our full suite of tools to analyze and optimize your website: