SEO for AI: Evolving from Web Pages to the Content Lake

Written by Richard Lawrence

People aren’t just typing into search bars. They’re asking assistants to answer questions, make recommendations, and take action. These systems don’t browse like humans. They scan for facts, structure, and signals they can reason with.

To get ahead and reach the right audience in this new medium, you need to surface content from across your business for AI assistants to learn from – in whichever format they require. As a Content Operating System, Sanity is the perfect platform for this new search paradigm – allowing you to structure, manage, and deliver all of your business content from its Content Lake.

PortableText [components.type] is missing "gatedContent"

Overnight disruption; years in the making

ChatGPT launched at the end of 2022 and caused a big bang in terms of public perception towards there being a new way to retrieve and digest information – appearing to be an almost overnight change.

But the progression to this point can be traced back through many smaller advancements over the years.

I wrote a post back at the beginning of 2019 talking about the path for the evolution of search (find it on Wayback machine here) and would say it still broadly holds true today.

Within the post, I referenced the three paradigms for 'assistive systems', taken from a 2018 paper by a Distinguished Scientist at Google, called Andrei Broder. The three paradigms are:

An image showing the paper 'A Call to Arms: Embrace Assistive AI Systems!' by Andrei Broder from 2018

You can also see how this maps to search engines (conducive), chat assistants (subordinate) and agents (decisive) in the era that has sprung up around us over the last couple of years.

Google's decade of failing to push us beyond search engines

At the time of writing back in 2019, features like featured snippets and knowledge panels had begun to transform the traditional Google search results page into something more along the lines of a subordinate system. But this was very pedestrian, trying not to shake things up too much and taking users on very small baby steps (featured snippets first showed up in 2014!).

Example featured snippet from 2019, taken from Search Engine Watch

Google's long term vision had always been much more ambitious.

It has long stated its ultimate objective was to create a system like the assistant from Star Trek–here's its head of Search discussing this back in 2013.

Attempts to progress in this direction include the moderately successful Google Assistant (surprisingly still the driver for the most viewed Wikipedia entry in 2024, so usage is not trivial - and not forgetting competitors such as Siri and Alexa), but also other ideas that disappeared in transit (anyone remember Google Duplex?).

Unfortunately for Google, it was ChatGPT and other LLM-based assistants that managed to bring a large audience firmly into the paradigm of subordinate systems for the first time and for the long haul (quite incredibly, 90% of users that now sign up to ChatGPT are still using it a month later).

Graph showing customer retention for ChatGPT.

The success of OpenAI and others can at least be partly attributed to not having the responsibility that burdened Google as the supposed guardian of the web, and the conservatism born out of that, which impeded its progress. And probably even more so, because Google's cushy ad revenue discouraged innovation towards a system that didn't give users options that included sponsored placements.

It is now trying to play catch up and further transform its conducive system into a subordinate one, with AI Overviews taking over the mantle from rich answers, and AI Mode blurring the lines even further. The jury is out as to how successful it will be.

Example AI Overview that informs us what AI Overviews are.

After only a few years, chat assistants are yesterday's news

Whilst the subordinate paradigm took decades to embed, we seem to be moving into the decisive paradigm at rapid pace, after only a few years.

AI agents are driving this–they aren't simply systems that provide us with information, but actively make decisions on our behalf whilst working towards an end goal.

AutoGPT was an early example back in April 2023, which allowed you to set goals and then watch as the AI autonomously worked through multiple steps to achieve them - researching, planning, and executing tasks with minimal human intervention.

Since then, we've seen rapid advancement with developer tools such as like Cursor and Windsurf that can execute multi-step tasks with minimal human intervention.

We're in a new world where AI agents can search the web, analyze data, make purchases, and even negotiate on our behalf.

Feeding the machines in the new era of search

The idea I explored in the 2019 blog post still rings true:

The new search engine is now becoming more like a super-userrational and able to consume a vast amount more information than a real user, before making a recommendation.

Regardless as to whether the assistant is making recommendations or decisions, you need to give it facts and data to work from - for example:

Giving the LLM as much as possible to learn from will be the key for the future in terms of reaching the right customers - to do this, we need to move beyond the concept of 'website content' to 'business content'.

What is all the content that might be useful to learn about you as a business or brand and where is it stored? It needs to be aggregated in one place so you can craft a comprehensive and compelling story for a proxy that deals in data and facts (more about this later).

There are then two ways that these facts are interpreted and communicated to the user via the LLM:

Adding yourself into the conversation

To be referenced in isolated conversations, information about your business needs to be part of the training set for the LLM - the inventory of content (millions of documents) that they initially learned from. This is often out-of-date to some extent. For example, Open AI's o4-mini model has a knowledge cut off of June 2024.

To ensure you are part of the training set for LLMs, you need to:

Optimizing for real-time AI queries

Traditional search engines will still have a part to play - ChatGPT uses Bing as its search tool (search engine) to retrieve relevant content for your query or discussion.

In addition, there will be methods to directly communicate with LLMs (jury is out on which format this will entail - more about this later).

For now, you should:

Why the Content Operating System beats CMSes for AI

Sanity was created on the premise of treating content as data, allowing you to store all of information relating to your business in one place - one place for all your content, so you can aggregate and present it however you want, to whoever you want. Whether that be via a website for users or the preferred format for an LLM (more about this in a minute).

Diagram showing different content types being stored in Sanity, before then being exposed to different apps and LLMs in the right format for them.

You can store all content within Sanity (within the Content Lake) to present on your website and get indexed via search - or feed directly to LLMs using whatever method they prefer, either now or in future. Just a few examples:

There are also additional features of Sanity that give you an advantage in the world of subordinate assistants such as Content Releases that will help you deliver changes to your business content at scale when needed, and the Live CDN which ensures your content is up-to-date in realtime for when LLMs visit your content via search tools.

Future-proof your content for any AI format

On the preferred format for the LLM, an early example has emerged with llms.txt and llms-full.txt.

One reason why people dislike this format is they see it as a separate inventory of content to manage or maintain. With Sanity, this is not the case - you can easily generate a llms.txt file directly from your existing content, without duplicating effort. As part of the Content Operating System, the Content Lake allows you to define structured content once and output it in multiple formats - whether that's for your website, mobile app, or now via a text file.

Internal server error