Question 1

What is a robots.txt file?

Accepted Answer

robots.txt is a plain-text file at the root of your website (yoursite.com/robots.txt) that tells web crawlers which URLs they may or may not fetch. Crawlers check it before requesting any other page. It's an honor-system protocol — well-behaved bots respect it; malicious scrapers ignore it.

Question 2

Why does robots.txt matter for AEO?

Accepted Answer

AI assistants like ChatGPT, Claude, Copilot, and Perplexity send their own crawlers (GPTBot, ClaudeBot, PerplexityBot, etc.) to fetch and index pages. If you block them in robots.txt, your content won't appear in their answers. If you allow them, your site becomes eligible to be cited. Most sites should explicitly allow AI search bots and consider their stance on training-only crawlers separately.

Question 3

What's the difference between AI search bots and AI training bots?

Accepted Answer

AI search bots (OAI-SearchBot, Claude-SearchBot, PerplexityBot, ChatGPT-User) fetch pages at query time to power citations and answers — blocking them removes you from those AI assistants entirely. AI training bots (GPTBot, Google-Extended, Bytespider, CCBot, anthropic-ai) crawl to build the next foundation model — blocking them opts you out of training without affecting today's citations. Many SEOs allow the indexers and block the trainers.

Question 4

Will blocking GPTBot stop ChatGPT from mentioning my site?

Accepted Answer

Mostly. GPTBot is OpenAI's training crawler — blocking it stops new training data ingestion. But ChatGPT also uses OAI-SearchBot and ChatGPT-User for live answers; if you only block GPTBot, your site can still be cited in real-time browsing. To remove yourself entirely, block all three.

Question 5

What about Google-Extended?

Accepted Answer

Google-Extended is a *separate* user-agent that controls whether Google can use your content to train Gemini / Bard / Vertex AI. It does NOT affect Googlebot or your classic Google Search rankings. You can block Google-Extended without losing organic traffic — a common AEO-friendly setup.

Question 6

Should I block Common Crawl (CCBot)?

Accepted Answer

Common Crawl is a non-profit that publishes an open web archive used by many LLMs (including most early GPT models). Blocking CCBot opts you out of that dataset — but doesn't necessarily protect you from individual labs that crawl directly. Useful as a signal of intent more than a hard guarantee.

Question 7

Where do I put the robots.txt file?

Accepted Answer

At the root of your domain, served as text/plain. For static sites (Next.js, Astro, Hugo), drop it into the public/ or static/ folder. For WordPress, upload via SFTP to the webroot. After deploying, verify with `curl -I https://yoursite.com/robots.txt` — you should see HTTP 200.

Question 8

Will robots.txt protect private content?

Accepted Answer

No. robots.txt is advisory — it tells well-behaved crawlers what to skip, but anyone (and any bot that ignores robots.txt) can still fetch the URL directly. For real privacy, use authentication, IP allowlists, or noindex headers. robots.txt is for telling Google/AI bots which public pages to ignore, not for hiding secrets.

Free Robots.txt Generator

An AI crawler robots.txt, not just a Googlebot one

24 AI + search crawlers covered

Smart presets for AEO

Minimal, correct output

The two kinds of AI bots your robots.txt sees

AI search bots

AI training bots

Frequently asked questions

Want to see how AI engines find you today?