·2 min read

The robots.txt check every B2B company should run today

Seven AI crawlers can be blocked by a single line in your robots.txt. Most companies don't know which ones they're blocking.

Go to your domain's robots.txt right now. The URL is yourdomain.com/robots.txt. What you find there determines whether AI systems can see you at all.

Seven crawlers index the web for the AI tools your buyers use. Block any of them and you are invisible to that system — not ranked low, not deprioritised. Gone.

The seven are:

  • ClaudeBot — Anthropic (powers Claude)
  • GPTBot — OpenAI (powers ChatGPT)
  • PerplexityBot — Perplexity
  • Google-Extended — Google's AI training and Gemini
  • Applebot-Extended — Apple Intelligence
  • CCBot — Common Crawl (training data source for many LLMs)
  • cohere-ai — Cohere

Most companies that block these didn't mean to. The problem is broad wildcard rules. A rule like Disallow: / under User-agent: * blocks every crawler that isn't explicitly allowed, including all seven above. It's a common setup from SEO consultants who were focused on duplicate content and crawl budget. Nobody was thinking about AI in 2019.

The rule that causes the most silent damage looks like this:

User-agent: *
Disallow: /

Even a well-intentioned file can go wrong. Some companies block specific bots by name for legitimate reasons and then assume everything else is fine. It isn't. If none of the seven are named with an explicit Allow, they inherit whatever the wildcard says.

The fix is straightforward. Add explicit allow rules for each crawler:

User-agent: ClaudeBot
Allow: /

User-agent: GPTBot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Google-Extended
Allow: /

User-agent: Applebot-Extended
Allow: /

User-agent: CCBot
Allow: /

User-agent: cohere-ai
Allow: /

Place these above any wildcard rules. Wildcards apply to everything not already addressed. Named user-agent blocks take precedence.

This is not an optimisation. It is the floor. 89% of B2B SaaS companies fail baseline AI visibility requirements. The most common reason is access, not content. The AI never got in to read anything.

Unblocking these crawlers takes five minutes. It does not guarantee AI will cite you. But blocking them guarantees it won't.

Weekly · AI Visibility

One AI visibility insight, every week.

No filler. No hype. Just what's actually moving in AI search, and what to do about it.

Free for now, but not forever.