Why can't Perplexity or ChatGPT see my page even though the content is in the HTML?

Because \"in the HTML\" and \"visible in the HTML\" are not the same thing. If your content is wrapped in a div styled with position:absolute; left:-9999px, display:none, or visibility:hidden, modern AI crawlers strip it before indexing. They treat hidden content as adversarial — a classic SEO cloaking pattern — and refuse to use it as citation material.

Is position:absolute; left:-9999px considered cloaking?

By modern AI crawlers, yes. It's one of the oldest SEO cloaking patterns — text positioned off-screen so humans don't see it but crawlers do. Search engines cracked down on it a decade ago. LLM crawlers inherited the same defenses. Any content inside an off-screen container gets discarded, even if you meant it as a legitimate SSR fallback for bots.

Do AI crawlers execute JavaScript?

Most don't, or only partially. PerplexityBot, ClaudeBot, GPTBot, and CCBot primarily parse raw HTML responses without running client-side JS. That means everything your React SPA loads after hydration is invisible to them. The only content they see is whatever the server sends in the initial response.

Does Google-Extended read hidden content?

Google-Extended (the opt-out crawler for Gemini training) follows the same rendering pipeline as Googlebot — which can execute JS but still penalizes cloaking patterns. Hidden content is either discarded or used as a negative signal. Either way, it won't cite you for it.

What's the difference between and hidden content for AI crawlers?

is explicitly designed to serve non-JS clients — crawlers generally read it. Hidden content (display:none, position:-9999px, visibility:hidden) is designed to be invisible to humans while still in the DOM — crawlers treat it as deceptive. Use for progressive enhancement fallbacks. Use server-rendered visible HTML for actual content.

How do I test whether a crawler can see my content?

Three steps. First, run curl against your URL with no JavaScript engine — does the response body contain your real text? Second, paste the URL into Perplexity or ChatGPT's web tool and ask a question your page should answer — does the tool cite your page? Third, check Google Search Console's URL Inspection tool for the \"View Crawled Page\" output. If any of the three come back empty, your content isn't reaching the index.

What's the fix if my SPA is serving a blank shell to crawlers?

Stop trying to fake SSR through hidden fallback divs. The durable fix is to render your public content pages as real server-rendered HTML — markdown files through a server-side renderer, static-generated HTML at build time, or a true SSR framework. Content pages don't need React. Move the SPA behind an auth boundary and serve pure HTML for everything crawlers and unauthenticated visitors hit.

Why AI Crawlers Strip Off-Screen Content (And How That Breaks Your AEO)

Modern AI answer engines — Perplexity, ChatGPT, Claude with web search, Google's AI Overviews, and the long tail of LLM crawlers that feed them — share one behavior that surprises most website owners: they discard hidden content.

If your page renders its real text inside a <div style="position:absolute;left:-9999px"> or behind display:none, the crawler sees that HTML but throws it away before indexing. Your content being in the response body is not enough. It has to be visibly in the response body.

This is the single most common reason why a site with perfectly valid HTML still can't get cited by AI answer engines.

# The pattern that breaks everything

A decade ago, SEO cloaking — serving different content to search engines than to humans — was the go-to black-hat tactic. Sites would load keyword-stuffed paragraphs into <div>s positioned off-screen, so Google saw the keywords but humans saw a clean page. Google cracked down hard, and modern crawlers penalize any pattern that resembles it.

The specific CSS rules that trip the defense:

position: absolute; left: -9999px; (the classic)
position: absolute; left: -10000em;
display: none;
visibility: hidden;
height: 1px; width: 1px; overflow: hidden;
clip: rect(0, 0, 0, 0); (older screen-reader trick)

Any content nested inside a container with one of these styles is flagged as adversarial markup by LLM-class crawlers and dropped from the indexable text.

The twist: it doesn't matter what you intended. A React SPA that injects an SSR fallback div for bots — positioning it off-screen so it doesn't show up once React hydrates — looks indistinguishable from classic cloaking to the crawler. Your honest SSR workaround gets thrown out alongside the cloaking attempts.

# Why "in the HTML" is not enough

When an AI crawler like PerplexityBot fetches your URL, it runs a parsing pipeline roughly like this:

Fetch the raw HTML response.
Parse the DOM.
Extract the visible, textual content — applying CSS rules, stripping anything hidden.
Rank the extracted text, feed it into the retrieval index, and make it available for citation.

Step 3 is the killer. Even if your <h1> and <p> elements are structurally present in step 2's DOM, step 3 removes them if they're positioned off-screen. The raw HTML becomes a dead letter.

curl will show you the content. Perplexity won't cite it. Same document, two different fates.

# How to check whether you have this problem

A 60-second diagnostic:

Raw fetch. curl -sS https://yourdomain.com/your-page | grep "your important phrase" — does it match? If not, the content isn't even in the response and you have a different problem (pure SPA shell).
Live LLM test. Paste your URL into Perplexity and ask a question your page directly answers. Does Perplexity cite the URL? Paste it into ChatGPT with web search on. Same question. Does Claude's search cite it?
Rendered vs. visible diff. If curl shows the content but Perplexity/Claude/ChatGPT ignore it, open the page in a browser, open DevTools, and inspect the container holding your real text. Look at computed styles. If you see position: absolute, left: -9999px, display: none, or visibility: hidden anywhere in the chain — that's your diagnosis.

If curl shows content and the container isn't hidden and LLMs still don't cite, the issue is elsewhere (typically: your domain isn't in the search index yet, not a content-visibility problem).

# The durable fix

Stop trying to patch SSR through hidden fallback divs. Two approaches actually work:

Option 1: Server-render your content pages directly. Markdown files plus a server-side renderer that emits real HTML on every request. No React on content URLs. First-byte HTML is the real content, visibly styled. When a crawler parses your page, there's nothing to strip.

Option 2: Static-generate at build time. Same idea, but rendered once at deploy rather than on every request. vite-plugin-ssr, Astro, Next.js static export, or a simple prerender.ts build step. Faster, cachable, same visibility guarantee.

Both approaches share a principle: public content pages and interactive app pages are different problems. SPA frameworks are the right tool for the logged-in product surface. They are the wrong tool for marketing pages, docs, and blog posts that need to show up in AI answer engines. Put the SPA behind an auth boundary. Serve pure HTML for everything crawlers hit.

# What the crawlers actually do

For reference, here's the short list of AI-specific crawlers and what they do with your page:

PerplexityBot — Perplexity's citation engine. Parses raw HTML, no JS execution. Skips hidden content.
ChatGPT-User — ChatGPT's browsing / search tool. Parses raw HTML. Skips hidden content.
ClaudeBot — Anthropic's crawler for Claude's web search. Similar profile.
GPTBot — OpenAI's training crawler. Text-only extraction. Skips hidden content.
Google-Extended — Google's AI training opt-out crawler, feeding Gemini. Uses Googlebot's rendering pipeline but penalizes cloaking.
GoogleOther — Google's general non-search crawler.
Applebot-Extended — Apple Intelligence training.
CCBot — Common Crawl, which seeds many LLM training sets.

All of them apply the same visibility rule. If it's hidden from humans, it's hidden from the index.

# The one-line takeaway

If your content isn't visible, it isn't indexed. If it isn't indexed, it can't be cited. Make the content real — or expect to be invisible to the AI-powered internet.

Tags: aeo · llm-discoverability · cloaking · perplexity · claude · chatgpt · ssr · seo · All Learn