AEO Guide: How to Track AI Crawlers on Your Website

AEO Guide: How to Track AI Crawlers on Your Website

Tool 9 min read

Get Instant Access

Enter your email to read the full resource and access all our free content.

By signing up, you'll also join our newsletter. We'll occasionally send you updates, tips, and other useful resources. You can unsubscribe any time.

AEO Guide: How to Track AI Crawlers on Your Website

Tool 9 min read

A guide to track AI crawler bots on your website and get automated weekly insights via Slack.


n8n File Link

What This Does


This system:
1. Detects AI crawlers (GPTBot, ClaudeBot, PerplexityBot, etc.) visiting your site
2. Logs their activity to Supabase automatically
3. Analyzes the data weekly using AI and sends insights to Slack


Section 1: Setting Up the Bot Tracker

What You'll Need
- A website (Next.js, Astro, SvelteKit, or any Node.js framework)
- Supabase account (free tier works)


Step 1: Create Supabase Table

1. Go to your Supabase project

โ†’ SQL Editor2. Run this SQL:

CREATE TABLE ai_crawler_logs (
    id UUID DEFAULT gen_random_uuid() PRIMARY KEY,
    bot_name VARCHAR(100) NOT NULL,
    user_agent TEXT NOT NULL,
    path TEXT NOT NULL,
    method VARCHAR(10) NOT NULL DEFAULT 'GET',
    referer TEXT,
    query_string TEXT,
    ip_address VARCHAR(45),
    host VARCHAR(255),
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
NOT NULL
);

-- Indexes for fast queries
CREATE INDEX idx_crawler_logs_bot_name ON
ai_crawler_logs(bot_name);
CREATE INDEX idx_crawler_logs_created_at ON
ai_crawler_logs(created_at DESC);

-- Allow public inserts (for middleware logging)
ALTER TABLE ai_crawler_logs ENABLE ROW LEVEL SECURITY;
CREATE POLICY "Allow public inserts" ON
ai_crawler_logs FOR INSERT TO public WITH CHECK (true);
CREATE POLICY "Allow public reads" ON ai_crawler_logs
FOR SELECT TO public USING (true);


Step 2: Install Dependencies

npm install @supabase/supabase-js


Step 3: Add Environment Variables

Add to your .env file:

SUPABASE_URL=https://your-project.supabase.co
SUPABASE_ANON_KEY=your-anon-key-here

Where to find these:
- Supabase Dashboard โ†’ Settings โ†’ API
- Copy "Project URL" โ†’ SUPABASE_URL
- Copy "anon public" key โ†’ SUPABASE_ANON_KEY


Step 4: Create Detection File

Create lib/crawler-detection.ts:

export const AI_CRAWLER_PATTERNS = [
    { name: 'GPTBot', pattern: /GPTBot/i },
    { name: 'ChatGPT-User', pattern: /ChatGPT-User/i },
    { name: 'ClaudeBot', pattern: /ClaudeBot/i },
    { name: 'Claude-Web', pattern: /Claude-Web/i },
    { name: 'PerplexityBot', pattern: /PerplexityBot/i },
    { name: 'Google-Extended', pattern: /Google-Extended/i },
    { name: 'Bingbot', pattern: /Bingbot/i },
    { name: 'Meta-ExternalAgent', pattern: /Meta-ExternalAgent/i },
    { name: 'Bytespider', pattern: /Bytespider/i },
    { name: 'Applebot-Extended', pattern: /Applebot-Extended/i },
];

export function detectAICrawler(userAgent: string) {
    if (!userAgent) return { isBot: false, botName: null };

    for (const bot of AI_CRAWLER_PATTERNS) {
        if (bot.pattern.test(userAgent)) {
            return { isBot: true, botName: bot.name };
        }
     }

     return { isBot: false, botName: null };
  }


Step 5: Create Logger

Create lib/crawler-logger.ts:

import { createClient } from '@supabase/supabase-js';

const supabase = createClient(
    process.env.SUPABASE_URL!,
    process.env.SUPABASE_ANON_KEY!
);

export async function logCrawlerActivity(data: {
    bot_name: string;
    user_agent: string;
    path: string;
    method: string;
    referer?: string | null;
    query_string?: string | null;
    ip_address?: string | null;
    host?: string | null;
}) {
    try {
       await supabase.from('ai_crawler_logs').insert(data);
    } catch (err) {
       console.error('[Crawler Logger] Error:', err);
    }
}


Step 6: Add Middleware

For Next.js - Create middleware.ts in project root:

import { NextResponse } from 'next/server';
import type { NextRequest } from 'next/server';
import { detectAICrawler } from './lib/crawler-detection';
import { logCrawlerActivity } from './lib/crawler-logger';

export function middleware(request: NextRequest) {
    const userAgent = request.headers.get('user-agent') || '';
    const detection = detectAICrawler(userAgent);

    if (detection.isBot && detection.botName) {
        const url = request.nextUrl;
        const ipAddress = request.headers.get('x-forwarded-for')?.split(',')[0].trim()
|| null;

        logCrawlerActivity({
           bot_name: detection.botName,
           user_agent: userAgent,
           path: url.pathname,
           method: request.method,
           referer: request.headers.get('referer')
           query_string: url.search || null,
           ip_address: ipAddress,
           host: url.hostname, }).catch(() => {});
   }

   return next();
});

Step 7: Test It

1. Test locally:
    curl -H "User-Agent: GPTBot/1.0" http://localhost:3000/
2. Check Supabase โ†’ Table Editor โ†’ ai_crawler_logs to see the log entry

Bot tracker is now set up! Crawler activity will be logged automatically.

Section 2: Setting Up the n8n Workflow


What You'll Need
- n8n instance (self-hosted or n8n.cloud)
- OpenAI API key (GPT-4 access)
- Google account (for Google Docs)
- Slack workspace


Step 1: Create Google Docs Knowledge Base

1. Create a new Google Doc
2. Copy the document ID from the URL
3. Add this content structure:


AEO Knowledge Base

Bot Types
- Training Bots: GPTBot, ClaudeBot, Meta-ExternalAgent
- User Bots: ChatGPT-User, PerplexityBot, OAI-SearchBot

Benchmarks
- Healthy 404 rate: <5%
- Critical 404 rate: >10%
- Homepage traffic ratio: <40% is healthy
- User bot ratio: >5% indicates citation potential

Warning Signs
- High 404 rates indicate broken links
- Homepage monopoly (>40%) suggests poor internal linking
- Training-only bots means content isn't being cited

Success Signals
- New bot types appearing
- User bots increasing over time
- Path diversity growing


Step 2: Import n8n Workflow

1. In n8n, go to Workflows
โ†’ Import from File
2. Upload your workflow JSON file (AI-bot_Crawler_Insights.json)
3. The workflow will appear in your workflow list


Step 3: Configure Credentials

Update each credential in n8n:

1. Supabase- Project URL: https://your-project.supabase.co- Service Role Key: Found in Supabase โ†’ Settings โ†’ API โ†’ Service Role Key
2. OpenAI- API Key: Your OpenAI API key- Model: gpt-4o or gpt-4.1
3. Google Docs- OAuth2 credentials from Google Cloud Console- Document ID: Your knowledge base document ID
4. Slack- OAuth2 app credentials- Channel: Select your target Slack channel


Step 4: Configure Workflow Nodes

Schedule Trigger:
- Set to run weekly (e.g., Monday 9:00 AM)

Supabase Node:
- Operation: Get All
- Table: ai_crawler_logs
- Return All: Enabled

Filter Node:
- Condition: created_at is after {{ $now.minus({ days: 7 }).toISO() }}

Google Docs Node:
- Operation: Get-
Document ID: Your knowledge base document ID

Slack Node:
- Channel: Select your target channel
- Text: {{ $json.message }}


Step 5: Test & Activate

1. Click Execute Workflow to test
2. Verify Slack messages appear correctly
3. Toggle workflow to Active

n8n workflow is now set up! You'll receive weekly AI crawler insights in Slack.


What You'll Get

Weekly Slack reports with:
- Stats Overview: Total crawls, bot distribution, 404 rates
- Key Insights: Critical issues, important findings, opportunities
- Priority Actions: Ranked recommendations with timelines
- Top 10 Pages: Most crawled URLs and optimization opportunities
- Bot Profiles: Behavioral analysis per bot type


Troubleshooting

No logs appearing:
- Check Supabase credentials are correct
- Verify table has public insert policy enabled
- Check server console for errors

n8n workflow errors:
- Verify all credentials are configured
- Check Google Docs document ID is correct
- Ensure Slack bot is invited to channel


----------------

If your company or brand is interested in improving your AEO, Ghost Team can help you.

Please book a strategy call with our team here.

Locked

More Resources