How to Validate Your XML Sitemap (and Fix It)
A practical guide to validate your sitemap online, fix the most common XML errors, and make sure both Google and AI crawlers can read every page you publish.

A broken sitemap is one of those problems that costs you traffic for months before you notice, because nothing throws an error. The pages just quietly don't get crawled. I built FixAEO's free sitemap validator after seeing the same handful of mistakes in scan after scan, and this post walks through what to check, why each thing matters, and how to fix it.
If you just want the fast path: paste your sitemap XML or your sitemap URL into the sitemap validator and it'll flag everything below in a couple of seconds. The rest of this is for when you want to understand what it's telling you.
What a sitemap actually does (and who reads it now)
A sitemap is a plain XML file that lists the URLs you want crawled, usually at https://yoursite.com/sitemap.xml. It's a hint, not a command. Google still decides what to index. But the hint matters more than people think, because it tells crawlers what's new and what changed without making them re-walk your whole site.
Here's the part most "validate your sitemap" guides written before 2024 miss. It's no longer just Googlebot reading these files. AI crawlers like GPTBot, ClaudeBot, PerplexityBot, and Google's own AI Overviews pipeline use sitemaps and lastmod dates to decide what to fetch and how often. If your sitemap is malformed or lists dead URLs, you're not just hurting your Google indexing. You're making it harder for the engines that answer "what's the best tool for X" to ever see your page. That's the whole reason a technical file like this shows up on an AEO site.
So validating your sitemap is table stakes for both search and AI visibility. Let's go through the errors I see most.
The six errors I see in almost every bad sitemap
1. Malformed XML
The file won't parse. Usually it's an unescaped & in a URL (it has to be &), a missing closing tag, or a stray character before the <?xml declaration. Even a single byte-order mark or a blank line at the very top can break strict parsers.
How to spot it: the validator either fails outright or warns that the XML declaration is missing. Open the raw file (view-source on the URL, not the rendered version) and check that the very first characters are <?xml version="1.0" encoding="UTF-8"?> with nothing before them. Encoding matters too. If it's not UTF-8, some crawlers choke on accented characters in your URLs.
Fix: regenerate the file from your CMS or build step rather than hand-editing. Hand-edits are how the stray & got there in the first place.
2. URLs that 404 or redirect
Your sitemap should list canonical, final, 200-status URLs only. No 404s, no 301 redirects, no http:// links that bounce to https://. Every dead URL in your sitemap is wasted crawl budget, and it tells Google your sitemap is stale and less trustworthy.
This is the most common one by far. A page gets deleted or its slug changes, but the static sitemap generator never gets re-run, so the old URL lingers. The fix is process, not a one-time cleanup: regenerate the sitemap on every deploy.
3. noindex pages sitting in the sitemap
This is the contradiction that confuses crawlers most. A sitemap says "please index this." A noindex meta tag or X-Robots-Tag header says "do not index this." When the same URL does both, you're sending mixed signals, and Google will sometimes flag it in Search Console as a coverage error.
Common culprits: tag pages, paginated archives, internal search results, thank-you pages, and staging URLs that leaked in. Pick one rule per page. If it shouldn't be indexed, keep it out of the sitemap. If it should, remove the noindex.
4. Wrong or missing lastmod
lastmod is the date a page last meaningfully changed. Crawlers use it to prioritize what to re-fetch. Two failure modes here. One, the field is missing entirely, so crawlers fall back to guessing and re-index your changed pages slowly. Two, and this is worse, your generator stamps today's date on every URL on every build. When every page claims it changed five minutes ago, the signal is worthless and crawlers learn to ignore it.
The validator reports your lastmod coverage as a percentage and flags how many URLs haven't been touched in over 12 months. Aim for real dates that reflect actual content changes. If a page genuinely hasn't changed in two years, let its lastmod say so. That honesty is what makes the recent dates meaningful.
5. Sitemap too big
The hard limits are 50,000 URLs and 50 MB uncompressed per sitemap file. Go over either and crawlers may ignore the whole thing. Plenty of ecommerce and programmatic sites blow past 50,000 without realizing it.
The fix is a sitemap index: one parent file that points to multiple child sitemaps, each under the limit. The validator detects whether you've handed it a regular sitemap or an index, and counts your URLs so you know how close you are. I'd start splitting around 40,000 rather than waiting for the wall.
6. Never submitted to Google Search Console
You can have a perfect sitemap that Google has never been told about. Submitting it in Search Console (Sitemaps section, paste the path, hit submit) does two things: it speeds up discovery, and it gives you a report showing how many URLs Google actually read and indexed versus how many you listed. That gap is one of the most useful diagnostics you have. Also reference the sitemap in your robots.txt with a Sitemap: line so any crawler that reads your robots file finds it automatically.
How to validate your sitemap online in two minutes
You don't need to install anything. Here's the routine I run:
- Open the sitemap validator and paste either your raw XML or your sitemap URL.
- Read the findings. It checks the XML declaration and encoding, confirms it's a valid sitemap or index, counts URLs against the 50,000 / 50 MB limits, flags duplicates, reports
lastmodcoverage, and calls out stale entries older than 12 months. - Spot-check 5 to 10 of your live URLs in a browser. Make sure they return 200 and aren't redirecting. The validator checks structure; this catches dead links.
- Confirm none of the listed URLs carry a
noindextag. - Submit (or re-submit) the file in Google Search Console and check back in a few days for the indexed-vs-submitted count.
Do this whenever you ship a big batch of pages, change your URL structure, or migrate platforms. Those are the moments sitemaps quietly break.
Why this matters for AI search, not just Google
I'll be blunt about why this lives on an AEO blog. AI engines are reading the same plumbing. A clean sitemap with accurate lastmod dates helps GPTBot and friends fetch your freshest content faster, which means your latest comparison page or product update has a better shot at being the thing an AI assistant cites.
A sitemap pairs naturally with two other files crawlers look for. One is your llms.txt, which curates your highest-value pages for AI specifically. The other is your robots.txt, where a Sitemap: line and unblocked AI bots do a lot of quiet work. If you're doing a broader pass, the AEO audit checklist walks through all of these in order, and it's worth knowing how to measure whether any of it moves traffic so you're not just guessing.
The sitemap is the least glamorous file on your site. It's also one of the cheapest things to get right, and one of the most expensive to get wrong, because the cost shows up as months of pages that never got seen.
FAQ
How do I validate my sitemap online for free?
Paste your XML or sitemap URL into a free tool like the FixAEO sitemap validator. It checks the XML structure, encoding, URL count against Google's 50,000 limit, duplicate URLs, and lastmod coverage in a couple of seconds. For dead-link checks, also spot-check a handful of your live URLs in a browser to confirm they return a 200 status.
What's the maximum size for an XML sitemap?
A single sitemap file can hold up to 50,000 URLs and must be no larger than 50 MB uncompressed. If you exceed either limit, split your URLs across multiple sitemap files and list them all in one parent sitemap index file. I'd start splitting around 40,000 URLs rather than waiting until you hit the ceiling.
Should noindex pages be in my sitemap?
No. A sitemap tells crawlers "index this," while a noindex tag says "don't." Putting both on the same URL sends a contradictory signal and often shows up as a coverage error in Search Console. Keep noindex pages, like tag archives and thank-you pages, out of the sitemap entirely.
Do AI search engines like ChatGPT and Perplexity use sitemaps?
Yes. AI crawlers such as GPTBot, ClaudeBot, and PerplexityBot use sitemaps and lastmod dates to find and prioritize content, the same way Googlebot does. A malformed sitemap or one full of dead URLs makes it harder for these engines to discover and cite your pages, which is why sitemap hygiene matters for AI visibility, not just traditional SEO.
If you haven't checked yours lately, run it through the free sitemap validator. It takes about two minutes and usually surfaces at least one thing worth fixing.
Related reading
The 30-point AEO audit checklist (2026)
AEO audit checklist: 30 signals across 7 categories โ from crawler access to per-engine verification. Copy it into Notion and run your audit today.
15 min readHow to add llms.txt to your website in 10 minutes
llms.txt tells AI assistants what your site is about. Get the exact format, a copy-paste template, and deployment steps for every major host.
7 min readGA4 Setup for AI Traffic: Surface ChatGPT Referrals
Default GA4 hides AI referrals in 'Direct' and 'Other'. Here's the 20-minute setup that surfaces them โ channel group, dimensions, and dashboard.
12 min readWhat is AEO? Answer Engine Optimization explained
Answer Engine Optimization (AEO) means getting AI assistants to recommend your brand. Learn what AEO is, why it matters more than SEO, and how to start.
7 min read
Free AEO tools
Put this into practice with free FixAEO tools โ no signup required.
AI Visibility Checker
Score your brand across 8 AI engines
AEO Audit Tool
Answer-engine readiness scan
Schema Generator
Build valid JSON-LD structured data
llms.txt Generator
Create a spec-compliant llms.txt
Sitemap Validator
Check your XML sitemap for errors
AI Content Grader
Grade content for AI citation readiness
Want to see how your brand scores?
FixAEO runs all the checks in this post automatically โ free, no signup.
Run a free scan