What is an XML sitemap?
An XML sitemap is a file (typically at /sitemap.xml) that lists the URLs on a website, optionally with metadata like last-modified date and change frequency. Its primary function is discovery: it tells search engines which pages exist so Googlebot does not have to rely entirely on crawling link paths to find them.
XML sitemap best practices
A well-maintained XML sitemap contains only canonical, indexable URLs — no noindexed pages, no redirected URLs, no soft 404s. Including redirect sources or noindexed pages in a sitemap creates a signal conflict that can slow Google's understanding of your site's URL structure.
Sitemap best practices for large sites include: submitting via Google Search Console, splitting into index files and sub-sitemaps if over 50,000 URLs, referencing the sitemap location in robots.txt, and keeping it updated automatically whenever new pages are published. Modern frameworks like Next.js can generate the sitemap automatically at build time.
Example
Example
A CuddlyNest-scale travel site runs a sitemap index file that references five child sitemaps: locations (2,613 URLs), static pages (26), hotels (245 sub-sitemaps), resorts (198), and motels (1). This structure lets Google prioritise crawling by sitemap file, not just by page.
Frequently asked questions
What URLs belong in an XML sitemap?
Only canonical, indexable, 200-status URLs. Including redirects, noindexed pages, or 404s sends Google contradictory signals and degrades trust in the whole file.
Does an XML sitemap improve rankings?
No, it improves discovery. Google finds and recrawls your pages faster, which matters most for large sites, new sites with few links, and sites that publish or update frequently.