Crawlability - Evergreen Ai

Crawlability works instantly, no account connection, no waiting. Just select a domain and see results in seconds.

What is Crawlability?

Every time an AI model like ChatGPT, Claude, or Gemini learns from the web or answers a question using live content, it relies on bots, automated programs that visit websites and read their content. Your website tells these bots what they’re allowed to read using a file called robots.txt. Crawlability shows you whether the world’s major AI bots can actually access your website - and flags anything that might be silently blocking your content from AI systems.

Think of it as a visibility audit for the AI era. If a bot can’t read your site, your content can’t appear in AI answers.

Getting Started

Select a domain

Choose one of your tracked domains from the project selector at the top of the page. Ai Brand Monitor instantly reads that domain’s robots.txt file and checks it against 48+ AI bots from 20+ vendors.

Read the table

The Crawlability table shows every bot, its status, and why it has that status. Green means go. Red means blocked. Yellow means partially restricted.

Investigate with the URL Tester

Want to know what happens for a specific Domain? Use the URL Tester tab to enter any domain and see exactly which bots are allowed or blocked, without needing a Ai Brand Monitor account.

The Crawlability Table

The table is your control center. Each row represents one AI bot and tells you everything you need to know at a glance.

Bot

The technical identifier the bot uses to identify itself when it visits your site. For example: GPTBot, ClaudeBot, Googlebot.

Bot Type

What the bot is actually doing. Is it collecting training data, powering search, or responding to a live user question?

Platform

The company behind the bot - OpenAI, Anthropic, Google, Meta, and so on - along with the specific AI product it feeds into.

Status

Whether the bot is Allowed, Blocked, or Partially restricted by your current robots.txt settings.

Understanding Bot Status

Allowed - the bot can access your content

Your robots.txt permits this bot to crawl your website freely. The bot can read your pages and use that content in its AI systems - whether for training or for live answers.What this means for you: Your content is eligible to appear in responses from this AI platform. If you want visibility in ChatGPT, Claude, or Perplexity answers, you need the relevant bots to be Allowed.

Partial - the bot can access some content

Your robots.txt has a general rule (using a wildcard *) that restricts paths on your site, but there’s no specific rule for this individual bot. The bot inherits those general restrictions.What this means for you: Some of your content may be visible to this AI, and some may not. Check the Reason column to understand exactly what rule is applying.

Blocked - the bot is explicitly denied

Your robots.txt has a specific rule that explicitly disallows this bot. It cannot crawl your site at all.What this means for you: This AI platform cannot use your content as a source. If this is intentional, great. If not, this is the first thing to investigate.

The Reason Column

The Reason column explains how a bot’s status was determined:

Explicitly mentioned: your robots.txt has a rule written specifically for this bot’s user-agent identifier.
Following global rules: there’s no specific rule for this bot, so it’s following your site’s general * (wildcard) rules.

Many sites accidentally block AI bots because of broad Disallow: / rules originally written to block older search engine crawlers. The Reason column helps you spot this quickly.

Filtering and Searching

When you’re managing 48+ bots, filters save time.

Search
Status Filter
Bot Type
Platform
Individual Bots

Type any bot name or platform name into the search bar to instantly narrow the table. Useful when you’re looking for a specific bot like GPTBot or a specific vendor like Anthropic.

URL Tester

The URL Tester lets you analyze any domain on the web - not just your tracked projects. Enter a competitor’s domain, a partner’s site, or any URL you’re curious about.

How to use it:

Click the URL Tester tab at the top of the page.
Type any domain (e.g. apple.com or yoursite.com) into the search bar.
Click Analyse - results appear instantly.

The results are split into two sections:

Explicitly Mentioned - bots that have a dedicated rule in that site’s robots.txt.
Following Global Rules - bots that are subject to the site’s general wildcard restrictions.

Use the URL Tester to benchmark competitors. If a rival is blocking all AI training bots while you allow them, or vice versa, that’s a strategic difference worth knowing about.

Bot Types Explained

Not all AI bots have the same goal. Understanding the difference helps you make informed decisions about what to allow and what to block.

Training Bots

These bots collect content to train AI models. If you block them, your content won’t be part of the dataset used to teach the model. Examples: GPTBot, ClaudeBot, Google-Extended, anthropic-ai.

Search Bots

These bots index your content for real-time retrieval - so when a user asks a question, the AI can pull in live information from your site. Examples: OAI-SearchBot, PerplexityBot, Applebot.

User Query Bots

These bots visit your site in real time when a user provides a specific link during a conversation. Think of it as the AI “clicking” a link on your behalf. Examples: ChatGPT-User, Claude-User, Perplexity-User.

Other

Utility crawlers, specialized scrapers, and bots with purposes that don’t fit neatly into the above categories. Examples: ClaudeBot, YouBot, omgilibot.

How to Interpret Your Results

Many bots showing Partial - is that a problem?

Partial usually means your site has broad robots.txt rules (like Disallow: /api/ or Disallow: /admin/) that all bots inherit. This is normal and often intentional. What matters is whether the paths being restricted contain content you want AI to access.If the restricted paths are just backend routes or staging areas, you’re fine. If they include your blog, product pages, or main content, consider adding explicit Allow rules for the AI bots you care about.

I didn't mean to block that bot - how do I fix it?

Review your robots.txt file and look for broad Disallow rules under the User-agent: * section. You can add a specific User-agent block for the bot you want to allow, with an Allow: / rule. This overrides the wildcard restriction for that specific bot.Example:

User-agent: GPTBot
Allow: /

User-agent: *
Disallow: /private/

Should I block training bots?

This is a business decision. Some companies block training bots to prevent their proprietary content from being used in AI model datasets without compensation or credit. Others allow them to maximize AI visibility and reach.There’s no universal right answer. Ai Brand Monitor shows you your current state - the decision is yours.

What's the robots.txt file?

robots.txt is a plain-text file located at the root of your website (e.g. yoursite.com/robots.txt). It’s an industry-standard way for websites to tell web crawlers - both traditional search engines and AI bots - which parts of the site they may or may not access.Bots are expected to respect these instructions, though compliance varies. All major AI platforms claim to honor robots.txt directives.

Reloading and Viewing Your robots.txt

Reload robots.txt

Click Reload robots.txt in the top-right corner to fetch the latest version of your site’s file. Useful after you’ve made changes and want to see them reflected immediately.

View robots.txt

Click View robots.txt to open your current file with full syntax highlighting. Quickly audit the exact rules in place without leaving Ai Brand Monitor.

Common Scenarios

Scenario	What to look for	What to do
You want ChatGPT to cite your content	Check if `OAI-SearchBot` and `ChatGPT-User` are Allowed	If Blocked, add an explicit `Allow` rule for these bots
A competitor is invisible in AI answers	Run their domain in the URL Tester	Look for Blocked status on Search and User Query bots
You updated robots.txt but nothing changed	Click Reload robots.txt	Verify the new rules are being parsed correctly
You want to block AI from using your content for training	Look for Training bots with Allowed status	Add explicit `Disallow` rules for those bots
You’re seeing unexpected Partial results	Check the Reason column	Identify which wildcard rule is being inherited

All Supported Bots

Ai Brand Monitor tracks 48+ bots across every major AI platform. The list is continuously updated as new AI systems are launched.

Bots are categorized based on publicly available information and each vendor’s stated purpose. As real-world behavior becomes clearer, categories may be refined to ensure accuracy.

Bot	Type	Platform	Purpose
`GPTBot`	Training	OpenAI / ChatGPT	Primary crawler for OpenAI’s foundational model training.
`anthropic-ai`	Training	Anthropic / Claude	General data collection for Claude model training.
`Google-Extended`	Training	Google / Gemini	Opt-out token for Gemini training and AI product improvement.
`Meta-ExternalAgent`	Training	Meta	High-velocity training crawler for Llama models.
`Applebot-Extended`	Training	Apple	Used for training Apple’s generative AI features.
`Amazonbot`	Training	Amazon	General training for Amazon Titan and Olympus models.
`Bytespider`	Training	ByteDance	Training for TikTok and ByteDance AI products.
`CCBot`	Training	Common Crawl	Massive open-source web archive used by many AI labs.
`Ai2Bot`	Training	Allen Institute	General-purpose crawler for Allen Institute AI research.
`Ai2Bot-Dolma`	Training	Allen Institute	Specifically builds the Dolma open dataset.
`cohere-ai`	Training	Cohere	Training for enterprise-grade language models.
`cohere-training-data-crawler`	Training	Cohere	Specialized crawler for raw Cohere training data.
`DeepSeekBot`	Training	DeepSeek	Training for the DeepSeek model series.
`PanguBot`	Training	Huawei / PanGu	Training for Huawei’s PanGu AI models.
`Webzio-Extended`	Training	Webz.io	Large-scale data scraping for AI providers.
`Diffbot`	Training	Diffbot	Structured data extraction as a service.
`FacebookBot`	Training	Meta	Web crawler for Meta AI training data collection.
`omgili`	Training	Webz.io	Forum and discussion crawler for structured dataset building.
`Timpibot`	Training	Timpi	Decentralized search engine training crawler.
`GrokBot`	Training	xAI / Grok	Real-time web search and training for Grok models.
`OAI-SearchBot`	Search	OpenAI / ChatGPT	Real-time retriever powering ChatGPT search answers.
`Claude-SearchBot`	Search	Anthropic / Claude	Anthropic’s bot for Claude’s search features.
`PerplexityBot`	Search	Perplexity	Fact-checking and retrieval for Perplexity answers.
`Amzn-SearchBot`	Search	Amazon	Search bot for Amazon’s AI shopping features.
`AzureAI-SearchBot`	Search	Microsoft	Web retrieval for Azure AI and Copilot services.
`Google-CloudVertexBot`	Search	Google	Crawling for Google Cloud Vertex AI services.
`meta-webindexer`	Search	Meta	Search indexing for Meta’s AI assistants.
`Applebot`	Search	Apple	Powers Spotlight, Siri, and Safari search functionality.
`Grok-DeepSearch`	Search	xAI / Grok	Real-time web search for Grok’s deep research feature.
`xAI-Grok`	Search	xAI / Grok	General-purpose web search bot for xAI and Grok.
`ChatGPT-User`	User Query	OpenAI / ChatGPT	Visits links directly provided by a user in ChatGPT.
`Claude-User`	User Query	Anthropic / Claude	Triggered when a Claude user prompts with a specific link.
`Perplexity-User`	User Query	Perplexity	Used during a user’s Deep Research session.
`Manus-User`	User Query	Meta	Action Agent: navigates and interacts with sites on user request.
`GoogleAgent-Mariner`	User Query	Google	Used by Google agents for web navigation (Project Mariner).
`NovaAct`	User Query	Amazon	Agent for automated web-based workflows on Amazon.
`Gemini-Deep-Research`	User Query	Google / Gemini	High-intensity agent for user-requested Gemini research.
`MistralAI-User`	User Query	Mistral	On-demand browser triggered by Mistral users.
`DuckAssistBot`	User Query	DuckDuckGo	Summarizes pages for DuckDuckGo’s AI responses.
`quillbot.com`	User Query	QuillBot	Fetches content to power QuillBot’s AI writing tools.
`meta-externalfetcher`	User Query	Meta	Used for real-time link expansion on Meta platforms.
`Claude-Code`	User Query	Anthropic / Claude	Fetches web resources during Claude Code sessions.
`ClaudeBot`	Other	Anthropic / Claude	Official training bot for Anthropic models.
`Claude-Web`	Other	Anthropic / Claude	Legacy bot for web browsing during Claude interactions.
`Google-Agent`	Other	Google	Used by Google agents to navigate the web and perform actions.
`YouBot`	Other	You.com	Fetches pages to power You.com’s AI search results.
`omgilibot`	Other	Webz.io	Forum-specific crawler variant. Commercial data product.
`MyCentralAIScraperBot`	Other	Unknown	Centralized AI data collection tool.

Tool References

Documentation Index

​What is Crawlability?

​Getting Started

​The Crawlability Table

Bot

Bot Type

Platform

Status

​Understanding Bot Status

​The Reason Column

​Filtering and Searching

​URL Tester

​Bot Types Explained

Training Bots

Search Bots

User Query Bots

Other

​How to Interpret Your Results

​Reloading and Viewing Your robots.txt

Reload robots.txt

View robots.txt

​Common Scenarios

​All Supported Bots

What is Crawlability?

Getting Started

The Crawlability Table

Understanding Bot Status

The Reason Column

Filtering and Searching

URL Tester

Bot Types Explained

How to Interpret Your Results

Reloading and Viewing Your robots.txt

Common Scenarios

All Supported Bots