Robots.txt Generator

From ToolzPedia, the free tools encyclopedia
This is one of several seo tools. For the full list of utilities, see All tools.

A robots.txt file lives at the root of your website and tells search-engine crawlers which pages they can and cannot fetch. It is a critical SEO file, a misconfigured robots.txt can accidentally hide your entire site from Google or, conversely, expose admin sections that should not be indexed.

The ToolzPedia Robots.txt Generator produces a properly-formatted robots.txt file from a checklist: choose which crawlers you want to allow or block, which paths to disallow (admin areas, search results, duplicate-content sections), the location of your sitemap, and any crawl-delay rules. The output is the standard text-based format ready to upload to your site's root.

Use the tool edit

Default Rules

How to use Robots.txt Generator edit

Follow these steps to use the tool:

  1. Pick your default rule

    Allow all crawlers (most sites), or block all crawlers (staging sites).

  2. Add specific bot rules if needed

    Block aggressive scrapers, allow specific search engines.

  3. List paths to disallow

    Common: /admin/, /wp-admin/, /search/, /tag/, /?utm=*

  4. Add your sitemap URL

    Full URL with protocol: https://yoursite.com/sitemap.xml

  5. Generate and download

    Save the output as <code>robots.txt</code> and upload it to your site's root.

Details edit

⚠️ Important

robots.txt is a suggestion, not a security boundary. To truly hide pages, use authentication or noindex meta tags. Blocking pages here may still let them appear in search if linked elsewhere.

Frequently asked questions edit

It tells search-engine crawlers which paths on your site they may or may not fetch.
At the exact root of your domain: https://yoursite.com/robots.txt. Subdirectory paths do not work.
No. To remove indexed pages, use the noindex meta tag on the page itself, then ask Google to recrawl.
No, it is advisory. Bad bots ignore it. For real security, use authentication.
Yes, name them in the User-agent directive. Common ones: GPTBot, ClaudeBot, CCBot.

Use cases edit

Setting up a new website

Generate a sensible default robots.txt with sitemap location and standard exclusions.

Blocking specific bots

Excluding aggressive scrapers or AI training bots that hammer your site.

Hiding admin and staging areas

Preventing search engines from indexing /admin, /wp-admin, /staging, etc.

Pointing to your sitemap

The Sitemap directive in robots.txt is one of the most reliable ways to ensure crawlers find your sitemap.xml.

Crawl-budget management for large sites

Disallowing low-value URL parameters (sort, filter combinations) so Google focuses crawl budget on real content.

How it works edit

A robots.txt file is plain text with a specific syntax: User-agent directives target specific crawlers (or all crawlers with *), and Allow / Disallow directives specify which paths each crawler can or cannot fetch. The Sitemap directive is global and points to your XML sitemap.

The generator presents the syntax as a UI: pick the crawlers (Googlebot, Bingbot, etc.), check which paths to disallow, enter your sitemap URL, and the tool composes a syntactically correct robots.txt for you to paste into a file at /robots.txt.

Tips and best practices edit

  • robots.txt must live at the exact root: <code>https://yoursite.com/robots.txt</code>. Subdirectories do not work.
  • After uploading, test your robots.txt in Google Search Console's robots.txt tester to verify it parses correctly.
  • Disallowing a URL in robots.txt does not remove it from the index if it was already indexed. Use the <code>noindex</code> meta tag for that.
  • Be careful with wildcards. <code>Disallow: /*?</code> blocks every URL with a query parameter, sometimes desirable, sometimes catastrophic.

Common mistakes edit

Accidentally blocking the entire site

<code>User-agent: *</code> followed by <code>Disallow: /</code> blocks everything. This is a common copy-paste mistake from staging sites.

Trusting robots.txt for security

robots.txt is advisory; bad actors ignore it. Use authentication for anything actually private.

Forgetting the sitemap directive

It is the most reliable way to ensure crawlers find your sitemap.

Other free seo tools available on ToolzPedia:

See also edit