Robots.txt Generator

Create custom robots.txt files for your website. Control how search engines crawl your site with easy-to-use presets and custom rules.

Configuration

Enter your XML sitemap URL
Time between crawler requests (optional, not supported by all bots)
Quick Syntax Guide
  • * matches all user-agents
  • / matches the entire site
  • /folder/ matches a specific folder
  • *.pdf matches all PDF files
  • $ matches end of URL

Generated Robots.txt

# robots.txt generated by WebINC.co # https://webinc.co/robots-generator.php # Click "Generate Robots.txt" to create your file
How to Use
  1. Configure your robots.txt rules above
  2. Click "Generate Robots.txt" to create the file
  3. Copy the content or download the file
  4. Upload robots.txt to your website's root directory
  5. Test at yourdomain.com/robots.txt

What is Robots.txt?

The robots.txt file is a simple text file placed in your website's root directory that communicates with web crawlers and bots. Following the Robots Exclusion Protocol (REP), first introduced in 1994, it tells search engines which pages they can and cannot access. While it's a powerful tool for managing crawler behavior, it's important to understand both its capabilities and limitations.

When a search engine bot visits your site, the first thing it does is look for yourdomain.com/robots.txt. Based on the instructions it finds, the bot decides which pages to crawl or skip. This happens before any actual crawling takes place, making robots.txt the gatekeeper of your website's crawlability.

Control Access

Specify which bots can crawl which parts of your site

Save Crawl Budget

Direct crawlers to important pages, skip low-value content

Hide Sections

Keep admin areas, staging, and private sections from crawlers

Link Sitemaps

Point crawlers to your XML sitemap for better indexing

Robots.txt Syntax and Directives

The robots.txt file uses a simple syntax with specific directives. Understanding each directive is essential for proper configuration:

User-agent

Specifies which crawler the following rules apply to.

# Apply to all crawlers
User-agent: *

# Apply only to Googlebot
User-agent: Googlebot

# Apply only to Bing
User-agent: Bingbot

Common user agents: Googlebot, Bingbot, Slurp (Yahoo), DuckDuckBot, Baiduspider, Yandex, facebot, Twitterbot

Disallow

Tells crawlers NOT to access specific paths.

# Block a specific page
Disallow: /private-page.html

# Block an entire directory
Disallow: /admin/

# Block all pages with query string
Disallow: /*?

# Block everything (entire site)
Disallow: /

Note: An empty Disallow: means nothing is blocked.

Allow

Permits access to specific paths, overriding Disallow rules. Useful for exceptions.

# Block /private/ but allow one page
User-agent: *
Disallow: /private/
Allow: /private/public-page.html

# Block all PDFs except one
Disallow: /*.pdf$
Allow: /docs/whitepaper.pdf

Note: Allow is supported by Google and Bing but not all crawlers.

Sitemap

Points crawlers to your XML sitemap(s) for better discovery of pages.

# Single sitemap
Sitemap: https://example.com/sitemap.xml

# Multiple sitemaps
Sitemap: https://example.com/sitemap-posts.xml
Sitemap: https://example.com/sitemap-pages.xml
Sitemap: https://example.com/sitemap-products.xml

Best practice: Always use absolute URLs with https://

Additional Directives

Directive Purpose Example
Crawl-delay Sets seconds between requests (Bing, Yandex; ignored by Google) Crawl-delay: 10
* (wildcard) Matches any sequence of characters in paths Disallow: /category/*/page
$ (end match) Pattern must match the end of URL Disallow: /*.pdf$
# (comment) Adds notes or explanations (ignored by bots) # This blocks admin section

Robots.txt Examples by Platform

WordPress
User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
Disallow: /wp-includes/
Disallow: /wp-content/plugins/
Disallow: /wp-content/cache/
Disallow: /trackback/
Disallow: /feed/
Disallow: /?s=
Disallow: /*?*

Sitemap: https://example.com/sitemap_index.xml
E-commerce
User-agent: *
Disallow: /cart
Disallow: /checkout
Disallow: /my-account
Disallow: /wishlist
Disallow: /orders/
Disallow: /search?
Disallow: /*?sort=
Disallow: /*?filter=
Allow: /

Sitemap: https://example.com/sitemap.xml

What Robots.txt Can and Cannot Do

What Robots.txt CAN Do
  • Block well-behaved crawlers from accessing specific paths
  • Reduce server load by limiting crawler activity
  • Optimize crawl budget by directing bots to important pages
  • Prevent indexing of duplicate content (search pages, filters)
  • Hide development/staging sections from search engines
  • Point crawlers to your sitemap for better discovery
  • Apply different rules to different bots
What Robots.txt CANNOT Do
  • Guarantee pages won't be indexed – other sites may link to them
  • Hide content from users – robots.txt is publicly accessible
  • Stop malicious bots – bad actors ignore robots.txt
  • Protect sensitive data – use authentication instead
  • Remove already-indexed pages – use noindex meta tag
  • Control search result appearance – use meta tags
  • Block all web scrapers – only well-behaved bots respect it
Important Security Warning

Never use robots.txt as a security measure! The file is publicly accessible at yoursite.com/robots.txt. Blocking a path actually reveals that something exists there. For private content, use proper authentication or keep it off the server entirely.

Robots.txt vs. Other Methods

Method What It Does Best For
Robots.txt Blocks crawling (bot never visits page) Saving crawl budget, blocking entire sections
Meta Robots Tag Blocks indexing (page crawled but not indexed) Removing specific pages from search results
X-Robots-Tag Header Same as meta robots but via HTTP header PDFs, images, and non-HTML files
Canonical Tag Indicates preferred URL version Duplicate content, URL parameters

Common Mistakes and Best Practices

Common Mistakes
  • Blocking CSS/JS: Google needs these to render pages. Don't block theme files.
  • Blocking entire site: Disallow: / blocks everything!
  • Using for security: Anyone can read your robots.txt
  • Forgetting trailing slashes: /admin vs /admin/ behave differently
  • Wrong file location: Must be in root directory only
  • Expecting de-indexing: Blocked pages can still appear if linked
Best Practices
  • Test before deploying: Use Google Search Console's tester
  • Keep it simple: Complex rules are hard to maintain
  • Include sitemap: Always add your sitemap URL
  • Use comments: Document why each rule exists
  • Monitor errors: Check Search Console regularly
  • Review regularly: Update as your site changes

AI Crawlers and Robots.txt

With AI systems like ChatGPT and Claude, many website owners want to control whether AI companies can use their content for training:

Company User Agent Purpose
OpenAI GPTBot Crawls for ChatGPT training data
OpenAI ChatGPT-User Real-time browsing for ChatGPT users
Google Google-Extended Crawls for Gemini training
Anthropic anthropic-ai Crawls for Claude training data
Common Crawl CCBot Open dataset used by many AI companies
# Block AI training crawlers
User-agent: GPTBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: CCBot
Disallow: /

Testing Your Robots.txt

Google Search Console

Test if URLs are blocked and validate changes before deploying.

Open Tool
Bing Webmaster Tools

Validate your file and see what Bingbot can access.

Open Tool
Manual Check

Visit yourdomain.com/robots.txt to see what crawlers see.

Frequently Asked Questions

A robots.txt file is a text file placed in your website's root directory that tells search engine crawlers which pages or sections to crawl or not crawl. It follows the Robots Exclusion Protocol and helps manage crawler access to your site.

The robots.txt file must be placed in your website's root directory and be accessible at yourdomain.com/robots.txt. Search engines look for it at this exact location. The file name must be lowercase.

Disallow tells crawlers not to access specified paths, while Allow permits access to specific paths within a disallowed directory. Allow is useful for making exceptions. For example, you can disallow /admin/ but allow /admin/public/.

No, robots.txt only suggests what crawlers should not access - it doesn't hide content. Pages blocked by robots.txt can still appear in search results if linked from other sites. To truly hide pages, use the noindex meta tag or password protection.

More SEO & Webmaster Tools

Use our full suite of tools to analyze and optimize your website:

Meta Analyzer
Analyze Meta
DNS Lookup
Check DNS
Redirect Checker
Check Redirects
Speed Test
Test Speed

Back to All Tools