Modern AI answer engines — Perplexity, ChatGPT, Claude with web search, Google's AI Overviews, and the long tail of LLM crawlers that feed them — share one behavior that surprises most website owners: they discard hidden content.
If your page renders its real text inside a <div style="position:absolute;left:-9999px"> or behind display:none, the crawler sees that HTML but throws it away before indexing. Your content being in the response body is not enough. It has to be visibly in the response body.
This is the single most common reason why a site with perfectly valid HTML still can't get cited by AI answer engines.
# The pattern that breaks everything
A decade ago, SEO cloaking — serving different content to search engines than to humans — was the go-to black-hat tactic. Sites would load keyword-stuffed paragraphs into <div>s positioned off-screen, so Google saw the keywords but humans saw a clean page. Google cracked down hard, and modern crawlers penalize any pattern that resembles it.
The specific CSS rules that trip the defense:
position: absolute; left: -9999px;(the classic)position: absolute; left: -10000em;display: none;visibility: hidden;height: 1px; width: 1px; overflow: hidden;clip: rect(0, 0, 0, 0);(older screen-reader trick)
Any content nested inside a container with one of these styles is flagged as adversarial markup by LLM-class crawlers and dropped from the indexable text.
The twist: it doesn't matter what you intended. A React SPA that injects an SSR fallback div for bots — positioning it off-screen so it doesn't show up once React hydrates — looks indistinguishable from classic cloaking to the crawler. Your honest SSR workaround gets thrown out alongside the cloaking attempts.
# Why "in the HTML" is not enough
When an AI crawler like PerplexityBot fetches your URL, it runs a parsing pipeline roughly like this:
- Fetch the raw HTML response.
- Parse the DOM.
- Extract the visible, textual content — applying CSS rules, stripping anything hidden.
- Rank the extracted text, feed it into the retrieval index, and make it available for citation.
Step 3 is the killer. Even if your <h1> and <p> elements are structurally present in step 2's DOM, step 3 removes them if they're positioned off-screen. The raw HTML becomes a dead letter.
curl will show you the content. Perplexity won't cite it. Same document, two different fates.
# How to check whether you have this problem
A 60-second diagnostic:
- Raw fetch.
curl -sS https://yourdomain.com/your-page | grep "your important phrase"— does it match? If not, the content isn't even in the response and you have a different problem (pure SPA shell). - Live LLM test. Paste your URL into Perplexity and ask a question your page directly answers. Does Perplexity cite the URL? Paste it into ChatGPT with web search on. Same question. Does Claude's search cite it?
- Rendered vs. visible diff. If curl shows the content but Perplexity/Claude/ChatGPT ignore it, open the page in a browser, open DevTools, and inspect the container holding your real text. Look at computed styles. If you see
position: absolute,left: -9999px,display: none, orvisibility: hiddenanywhere in the chain — that's your diagnosis.
If curl shows content and the container isn't hidden and LLMs still don't cite, the issue is elsewhere (typically: your domain isn't in the search index yet, not a content-visibility problem).
# The durable fix
Stop trying to patch SSR through hidden fallback divs. Two approaches actually work:
Option 1: Server-render your content pages directly. Markdown files plus a server-side renderer that emits real HTML on every request. No React on content URLs. First-byte HTML is the real content, visibly styled. When a crawler parses your page, there's nothing to strip.
Option 2: Static-generate at build time. Same idea, but rendered once at deploy rather than on every request. vite-plugin-ssr, Astro, Next.js static export, or a simple prerender.ts build step. Faster, cachable, same visibility guarantee.
Both approaches share a principle: public content pages and interactive app pages are different problems. SPA frameworks are the right tool for the logged-in product surface. They are the wrong tool for marketing pages, docs, and blog posts that need to show up in AI answer engines. Put the SPA behind an auth boundary. Serve pure HTML for everything crawlers hit.
# What the crawlers actually do
For reference, here's the short list of AI-specific crawlers and what they do with your page:
- PerplexityBot — Perplexity's citation engine. Parses raw HTML, no JS execution. Skips hidden content.
- ChatGPT-User — ChatGPT's browsing / search tool. Parses raw HTML. Skips hidden content.
- ClaudeBot — Anthropic's crawler for Claude's web search. Similar profile.
- GPTBot — OpenAI's training crawler. Text-only extraction. Skips hidden content.
- Google-Extended — Google's AI training opt-out crawler, feeding Gemini. Uses Googlebot's rendering pipeline but penalizes cloaking.
- GoogleOther — Google's general non-search crawler.
- Applebot-Extended — Apple Intelligence training.
- CCBot — Common Crawl, which seeds many LLM training sets.
All of them apply the same visibility rule. If it's hidden from humans, it's hidden from the index.
# The one-line takeaway
If your content isn't visible, it isn't indexed. If it isn't indexed, it can't be cited. Make the content real — or expect to be invisible to the AI-powered internet.