AI SEO: Making Your Website Discoverable by LLMs

I asked Claude a question that felt obvious in hindsight but that I'd never thought to ask before: "From an SEO point of view, is my website set up to be picked up by AI properly?"

Not Google. Not Bing. AI. Language models. The systems that increasingly answer people's questions by synthesising information from across the web. If someone asks ChatGPT or Perplexity "what's a good test management tool?" or "who builds Blazor SaaS products?" - would my sites show up in that answer?

The honest answer was: probably not. Because I'd done the traditional SEO work - meta tags, sitemap, decent content structure - but I hadn't done anything specifically aimed at making my content machine-readable in the way that LLMs consume it.

That's a different problem from traditional SEO, and it's one that most developers haven't started thinking about yet.

What AI Discoverability Actually Means

Traditional SEO is about ranking in search results. You optimise for keywords, build backlinks, ensure your site loads fast, make it mobile-friendly. Google's crawler indexes your pages, and their algorithm decides where you appear in the results.

AI discoverability is about being included in AI-generated answers. When an LLM is asked about a topic your site covers, will it know about your content? Will it cite you? Will it recommend your products?

These are related but different problems. A site can rank well on Google but be effectively invisible to AI models. And the techniques for solving each overlap but aren't identical.

The key difference is structure. Search engines are very good at extracting meaning from unstructured HTML. They've had twenty-five years of practice. LLMs are also capable, but they benefit enormously from explicit structural cues - semantic HTML, structured data, clear hierarchies, and machine-readable metadata that tells them not just what's on the page but what the page is about and how its entities relate to each other.

Structured Data: The Foundation

The first thing Claude recommended was comprehensive schema.org structured data. Not just the basic stuff - proper JSON-LD markup for every significant entity on the site.

For The Code Guy, that meant Person schema for me (Paul Allington - developer, CTO, the whole professional identity), Organization schema for the business, SoftwareApplication schema for each product (Task Board, TestPlan, CoSurf), BlogPosting schema for every blog post with author, date, categories, and word count, and WebSite schema tying it all together with proper search action markup.

For the product pages, we added detailed structured data: application category, operating system, pricing information, feature lists. The kind of detail that makes it trivially easy for an LLM to extract factual information about what each product does, what it costs, and who it's for.

Here's the thing though - most of this is good for traditional SEO too. Google has been encouraging structured data for years. The difference is that for traditional SEO it's a nice-to-have that might get you rich snippets. For AI discoverability, it's closer to essential, because structured data is the easiest way for an LLM to extract reliable facts about your site.

AI-Specific Metadata

Beyond schema.org, there's an emerging set of practices specifically aimed at AI crawlers and LLM training data. This is a newer area - the standards aren't fully established yet - but the direction is clear.

We added clear, concise meta descriptions that read like factual summaries rather than marketing copy. LLMs are better at processing factual statements than they are at interpreting promotional language. "Paul Allington is a .NET developer and CTO who builds SaaS products with AI-assisted development" is more useful to an LLM than "Discover innovative software solutions from a leading tech visionary." The first one contains facts. The second one contains nothing.

We also ensured that the site's robots.txt and sitemap were configured to allow AI crawlers - some sites inadvertently block them. And we added clear content hierarchy: H1 for the page title, H2 and H3 for section structure, paragraphs that make standalone sense without needing the surrounding context. LLMs often process content in chunks, so each section should be independently comprehensible.

Performance: Lighthouse as a Diagnostic Tool

Part of the broader SEO work involved performance optimisation, and this is where the AI workflow really shone. I ran Lighthouse audits on my pages and fed the results directly to Claude.

Not "here's my Lighthouse score, what should I do?" - I mean I gave it the actual audit report with all the diagnostic details. The specific opportunities, the failing audits, the performance metrics. Claude could then address each issue systematically: render-blocking resources, unoptimised images, missing cache headers, accessibility issues.

The image optimisation was a significant win. We converted images to WebP format, which I should have done ages ago but hadn't because it was one of those tasks that's not urgent enough to prioritise. With AI handling the implementation, the conversion was straightforward - identify all the image assets, generate WebP versions, update all the references throughout the templates and stylesheets. The kind of tedious, comprehensive task that AI is perfect for.

The Semrush Audit Approach

I also ran a Semrush site audit and fed the resulting PDF directly to Claude. This was surprisingly effective. Semrush produces detailed reports about technical SEO issues - broken links, missing alt text, duplicate content, slow pages, redirect chains - and Claude could parse the report and produce a prioritised fix list.

More importantly, it could explain why each issue mattered and implement the fixes. Not just "you have missing alt text on twelve images" but "here are the twelve images, here's appropriate alt text for each based on the image context, and here's the code change to add them." From diagnosis to fix in one step.

This is an underrated pattern: use traditional tools for diagnosis, feed the results to AI for remediation. The traditional tools are better at systematic scanning. The AI is better at interpreting results and implementing fixes. Together, they're remarkably effective.

What About Application Insights?

While working on the site performance, I noticed that the Azure Application Insights JavaScript SDK was showing deprecation warnings. This is worth mentioning because it's exactly the kind of thing that AI catches when you're paying attention to Lighthouse output - a third-party script that's adding weight to your page load while simultaneously approaching end of life.

The practical question is whether you still need client-side Application Insights tracking, or whether server-side monitoring covers your needs. For a mostly static marketing site like The Code Guy, server-side monitoring is probably sufficient. For a complex web application, you might still want client-side telemetry - but it's worth reviewing whether the legacy SDK should be replaced with the newer OpenTelemetry-based approach.

A Practical Checklist

If you want to make your site AI-discoverable, here's what I'd actually do, based on going through this exercise myself:

Add comprehensive schema.org JSON-LD markup. Person, Organization, WebSite, and whatever entity types match your content. Be thorough - the more structured data you provide, the easier it is for LLMs to extract facts about you.
Write meta descriptions that read like factual summaries. Drop the marketing speak. State clearly what the page is about, who it's for, and what it contains.
Use semantic HTML properly. Clear heading hierarchy, meaningful section structure, paragraphs that make sense in isolation. This helps both search engines and LLMs.
Ensure AI crawlers aren't blocked. Check your robots.txt. Some bot-blocking rules inadvertently prevent AI crawlers from indexing your content.
Optimise performance. Run Lighthouse, feed the results to your AI tool of choice, implement the fixes. Convert images to WebP. Address render-blocking resources. Fast sites get crawled more thoroughly by everyone.
Run a proper SEO audit. Use Semrush, Ahrefs, or similar. Feed the results to AI for systematic remediation. Fix the technical issues that traditional SEO audits find - they matter for AI discoverability too.
Create content that answers questions. LLMs are trained on content that provides clear answers to questions. Blog posts, documentation, FAQs - anything that clearly and factually addresses topics in your domain makes you more likely to appear in AI-generated responses.

The Emerging Field

AI SEO is still early. The rules aren't fully written yet. We don't know exactly how different LLMs weight structured data versus unstructured content, or how AI search products like Perplexity decide what to cite. The field is moving fast and the best practices are evolving.

But the direction is clear: optimising only for Google is no longer sufficient. A growing percentage of how people find information is mediated by AI systems, and those systems have their own preferences for how content should be structured and presented.

The good news is that most of what makes a site AI-discoverable also makes it better for traditional SEO. Structured data, clean HTML, fast loading, factual content - these are universal goods. You're not choosing between optimising for Google and optimising for AI. You're building a site that's genuinely well-structured, and both benefit.

The bad news is that if you've been ignoring structured data and relying on Google's ability to figure out unstructured pages, you've got some catching up to do. But the work is straightforward, the tools are available, and - as I discovered - AI is remarkably good at helping you implement the very optimisations that make your site more visible to AI.

There's a pleasing circularity to that.