How to serve content to agents (a field guide)

Written by Knut Melvær

"AI-ready content." Everyone agrees you need it. Nobody agrees on what it means. AEO strategies (or GEO, or 'SEO for AI'), llms.txt debates, Cloudflare shipping markdown at the edge, agents that negotiate content types. The conversation is getting louder, and most of it conflates at least three different questions.

How do you get AI to cite and recommend your content? That's the positioning question, and it has two parts. Namely, what your content looks like when models are trained on it, and when agents retrieve it to answer a prompt.

The other question is about when an agent does show up, how do you serve your content without bloating its context window or losing meaning in translation? That's the content consumption question. Your content now has consumers you didn't design for: agents requesting your pages, humans copying your docs into Claude, RAG pipelines pulling your content into retrieval systems. None of them want your cookie banners, navigation chrome, or ad scripts. They want the content. And probably only the content that actually provides the answers they’re looking for.

And then: does agentic content consumption affect positioning? Does serving cleaner content to agents also improve how they represent you?

If you're a developer, the content consumption section is where the actionable stuff lives. If you're a content strategist, the positioning question is more your territory. Read both. You'll need to explain the other one to your team.

Can you optimize for AI citations? (here's what we know)

Let's start with Agent Engine Optimization (AEO, sometimes called GEO or "SEO for AI"), since that's what got everyone's attention: can you optimize your content so AI models cite you more?

Maybe. But the honest answer is: we don't really know yet.

One group that's been looking closely at this is Profound, a company that tracks how AI platforms cite and recommend content across ChatGPT, Perplexity, and Gemini. They've been publishing primary research on the topic.

In their latest study, they took 381 pages across 6 websites, randomly assigned half to serve markdown and half to serve HTML, and watched for three weeks. The result? They found no statistically significant increase in bot traffic from serving markdown. Their recommendation? Focus on fundamentals: quality content, clear structure, fast load times. "The format you serve them in? Probably not the leverage point you're looking for."

Their most important finding for anyone building an AEO strategy is that AI citations can shift by up to 60% in a single month. The page ChatGPT recommends today might not be the one it recommends next month.

If you're trying to "optimize" for a system that volatile, you're chasing a moving target.

There's academic work too. A paper from Princeton (published at KDD 2024, which feels like three decades ago in the AI-timeline) found that certain strategies (adding statistics, using authoritative language, citing sources) could boost visibility by up to 40% in their benchmark. Worth noting: those numbers come from lab conditions, not from testing against ChatGPT or Perplexity in the wild. The strategies themselves are basically good writing advice, which is worth doing regardless of AI.

PortableText [components.type] is missing "callout"

Meanwhile, the models themselves are getting better at handling whatever you throw at them. Anthropic just released dynamic filtering for Claude's web search. The model now writes Python code to parse and filter HTML results before they hit the context window. The result: 11% better accuracy, 24% fewer input tokens. The models are investing heavily in solving the "finding you" problem on their end.

So while the models will get better at finding you, they won't necessarily get that much better at your content being clean. Which brings us to the part where you have agency.

What to serve agents when they show up

Profound (the AI citation tracking company) measured bot visits, how often agents show up. They found format doesn't change that. Agents show up regardless. Cool.

If agents are already showing up (and they are, check your server logs), then the question isn't how to attract them. It's what you serve them when they arrive. And not just agents: humans are copying your docs into AI tools, RAG pipelines are pulling your content into retrieval systems, edge services are converting your pages on the fly.

The strategies here are practical, the evidence that this might matter is strong, and you control the outcome.

Here's what the options look like, from zero effort to full infrastructure investment.

Do nothing

Agents will convert your HTML to markdown themselves. Every major AI tool (Claude, ChatGPT, Gemini, Perplexity) does this internally. Most AI crawlers don't even execute JavaScript: they see your raw HTML, not your rendered page. Claude Code uses a library called Turndown. It works. It's also lossy and token-expensive. Your 100K-token HTML page becomes maybe 3K tokens of useful content after the agent strips out navigation, footers, scripts, and cookie banners. That's a 97% waste of context window. Even the agents that do render JavaScript (like Google's crawler or ChatGPT's Operator) still get the full DOM with all the navigation chrome. The token waste problem doesn't go away just because JS executes.

It gets worse if your site relies on client-side rendering. Vercel's research found that ChatGPT and Claude crawlers fetch JavaScript files but don't execute them. Google's Gemini (via Googlebot) and AppleBot are the exceptions. Even the agents that do render JavaScript still get the full DOM, navigation and all. The token waste problem doesn't go away just because JavaScript executes.

Add an llms.txt file

Llms.txt is a markdown file at a known URL (/llms.txt) that gives agents an overview of your site with links to detailed content. Over 2,000 sites have adopted it, including Next.js, shadcn/ui, TanStack, Cloudflare, and Hugging Face. It's simple to implement and useful as a discovery layer. Anthropic uses theirs as a lightweight sitemap: brief descriptions and links organized by section, 892 tokens total. Even the company building the agents treats llms.txt as an index, not a content dump.

But isn't llms.txt becoming a standard? It's becoming adopted, which isn't the same thing. llms.txt is a proposal from Jeremy Howard's FastHTML project, not a ratified standard. The GitHub issues show debates about merging it with other proposals (AGENTS.md), and scope creep into things like crypto wallet addresses and "emotional brand positioning extensions." More practically: it's all-or-nothing. An agent gets your entire corpus or nothing. No per-page granularity, no per-agent control, no governance over what gets consumed. For developer docs, that's probably fine. For anything you'd rather serve selectively, you've just made it trivially easy to copy-paste everything.

The spec also proposes llms-full.txt, a companion file containing your entire corpus as markdown. In theory, agents can grab everything at once. In practice, the numbers work against you. Cloudflare's developer docs produce an llms-full.txt of 46.6MB, roughly 12 million tokens, about 60x Claude's context window. Even when the file fits, longer context degrades model performance regardless of content quality. An agent that needs one answer doesn't benefit from receiving your entire library.

If you want a quick maybe win, add an llms.txt. If you want control over what agents get and when, content negotiation gives you more options (more on that below).

Turn on Cloudflare's edge conversion (if you host on Cloudflare)

Cloudflare launched Markdown for Agents in February 2026: a dashboard toggle that converts your HTML to markdown at the edge when agents request it. No code changes. 80% token reduction on their own blog. It's a good default if you can't touch your content layer.

The tradeoff: reverse-engineering HTML back to markdown is inherently lossy. A generic parser doesn't know which parts of your page are content and which are chrome. Custom components, structured relationships, section-level meaning: none of it survives the round trip.

Serve markdown routes with content negotiation

Agents have started to request markdown from you using the Accept header. So when an agent sends Accept: text/markdown, your server could respond with markdown. Same URL, different representation. The same content negotiation pattern HTTP has supported for decades. I built a Sanity course around this, and we use it for Sanity Learn itself.

It doesn’t take a lot of code if you use Sanity already. With the @portabletext/markdown library you can take the same content that renders to HTML, and render it to markdown as well:

Internal server error