How to Generate an XML Sitemap (Complete SEO Guide, 2026)

From ToolzPedia, the free tools encyclopedia · 🗺️ Guides · 7 min read
For more articles, see the ToolzPedia blog. For tools, see All tools.
XML sitemap file shown with URL list and last-modified dates
XML sitemap file shown with URL list and last-modified dates

An XML sitemap is the fastest way to make Google index your new pages — if you build it right. Here's what to include, what to leave out, and how to generate one free in your browser.

An XML sitemap is how you stop hoping Google finds your pages and start telling it where they are. Done right, it's the difference between a new article appearing in search results within hours versus a few weeks. Done wrong — outdated URLs, blocked pages listed, missing the canonical version — it actively hurts your indexing.

This guide explains what a sitemap actually does (and doesn't), the four files Google looks at to crawl your site, how to generate a clean sitemap free, and exactly where to submit it so it gets used.

What an XML sitemap really is

A sitemap is a list of URLs on your site, written in XML, that tells search engines:

  • Which pages exist
  • When each was last updated (<lastmod>)
  • How often each typically changes (<changefreq> — Google now mostly ignores this)
  • How important each is relative to others (<priority> — also mostly ignored)

The two pieces that genuinely matter to Google in 2026 are the URL list itself and the lastmod date. The rest is legacy. Google confirmed in 2023 that it uses <lastmod> heavily as a recrawl signal but largely ignores <priority> and <changefreq>.

What a sitemap is not

A sitemap is not:

  • A guarantee of indexing. Google still decides what to index.
  • A ranking factor. It helps discovery, not rankings.
  • A substitute for internal linking. If a page can only be found via the sitemap, Google considers it lower-value.
  • A way to make Google crawl faster. It can prioritize newly-listed URLs, but won't change overall crawl rate.

What it is: a discovery shortcut. Particularly valuable for new sites, large sites, sites with weak internal linking, and sites that publish content frequently.

Do you need one?

If your site has fewer than ~50 pages and good internal linking, technically no — Google will find everything. In practice, every site benefits from one because it speeds up indexing of new content and makes it visible in Search Console which URLs Google knows about.

You absolutely need one if:

  • Your site has 500+ pages
  • You publish new content regularly (blog, product catalog, news)
  • Your site uses lots of JavaScript and pages aren't easily discovered by following links
  • You're a new site with few or no backlinks
  • You have orphan pages (pages no internal links point to) that you still want indexed

The four files Google looks at

A complete crawl setup has four files at your site's root. Each does something different and they work together.

1. robots.txt (at /robots.txt)

Tells crawlers which paths to crawl and which to skip. Also points to your sitemap. Without this line in robots.txt, Google may not discover your sitemap on its own:

Sitemap: https://yoursite.com/sitemap.xml

Don't have one yet? Our robots.txt generator builds one in a minute.

2. sitemap.xml (or sitemap_index.xml)

The actual list of URLs. If your site has more than 50,000 URLs or the sitemap is over 50 MB uncompressed, you need to split it across multiple sitemaps and point to them with a sitemap index file. Most small and mid-size sites are nowhere near these limits.

3. Image and video sitemaps (optional)

Separate XML files that list image and video URLs. Most sites don't need these — Google discovers images by crawling pages — but they help if you have a media-heavy site (stock photo, video tutorial library, etc.).

4. Page-level meta (canonical tag, noindex)

Every page should specify its canonical URL and whether it should be indexed. The sitemap should only ever list canonical, indexable URLs — listing duplicates or noindex'd pages confuses Google.

What to include and what to exclude

Include in the sitemap:

  • Every canonical URL you want indexed
  • Only the HTTPS, www-or-non-www version that matches your canonical
  • Pages that return a 200 status code
  • Both new and updated pages (with current <lastmod>)

Exclude:

  • URLs blocked by robots.txt
  • Pages with <meta name="robots" content="noindex">
  • Duplicate URLs (with tracking parameters, alternate sort orders, etc.)
  • Pages that return 3xx, 4xx, or 5xx
  • Pagination pages that don't add unique content (/page/2, /page/3 — debated, mostly skip)
  • Internal search result pages
  • Login, account, cart pages

A common mistake: shipping a sitemap with 5,000 URLs of which only 2,000 are actually indexable. Google sees the mismatch in Search Console and trusts your sitemap less going forward.

How to generate a sitemap

For small sites and most blogs, you don't need to install anything. Using our free sitemap generator:

1. Enter your homepage URL

The tool crawls your site starting from the URL you give it, following internal links and discovering pages.

2. Set crawl options

  • Max URLs to crawl — usually 500 is fine for small sites; raise for larger ones
  • Crawl depth — how many clicks from homepage to follow. Depth 3-4 covers most blog and small business sites
  • Exclude patterns — URL paths to skip (e.g., /wp-admin/, /cart/, /?s=)

3. Let it crawl

Crawling a 200-page site takes about a minute. The tool finds pages, gets their last-modified date (from headers or Open Graph), and builds the URL list.

4. Review the URL list

Look at what it found. If it discovered URLs you don't want indexed, exclude their pattern. If it missed pages, link to them better from your existing pages — that's a sign of weak internal linking, which is fixable.

5. Download the sitemap.xml

A standards-compliant XML file. Save it.

6. Upload to your site root

Put the file at https://yoursite.com/sitemap.xml. The location must be the root or a directory you specify in robots.txt.

7. Add it to robots.txt

Sitemap: https://yoursite.com/sitemap.xml

8. Submit to Google Search Console

Go to Search Console → Sitemaps → enter sitemap.xml → Submit. Google will crawl it within hours and report any errors.

Repeat step 8 for Bing Webmaster Tools — it covers Bing, Yahoo, DuckDuckGo, and a chunk of AI search.

How often to regenerate

  • Blogs / news sites — regenerate whenever you publish, or use a CMS plugin that updates automatically
  • Static brochure sites — monthly, or whenever you add/remove pages
  • E-commerce — daily (use an automated tool, not manual regeneration), since product status changes constantly
  • Large enterprise sites — automated build as part of deploys

Google rechecks sitemaps regularly — typically daily for active sites — so it picks up updates without you having to resubmit.

Common questions

Why is my page in the sitemap but not indexed?

A sitemap entry is a request, not a command. Google decides what to index based on perceived quality, crawl budget, duplicates, and a few other signals. Check Search Console's "Page indexing" report for the specific reason on each non-indexed URL. The most common are "Discovered – currently not indexed" (low perceived value) and "Crawled – not indexed" (content not unique enough).

Should I use sitemap.xml or sitemap_index.xml?

Use sitemap.xml if you have under 50,000 URLs. Use sitemap_index.xml (which references multiple sub-sitemaps) only when you exceed that.

Do I need separate sitemaps for mobile and desktop?

Not since 2019. Mobile-first indexing means Google treats your responsive site as one. Listing each URL once is correct.

How do I include hreflang for multi-language sites?

Add <xhtml:link rel="alternate" hreflang="..."> tags inside each <url> entry. This signals to Google which language version corresponds to which region. Get this wrong and you'll have duplicate-content issues across language variants.

What's a sitemap "ping"?

The old way of telling Google "I updated my sitemap, come look." Google retired sitemap pings in 2023. Now Google just rechecks active sitemaps regularly. You don't need to ping; you just need accurate <lastmod> dates.

My sitemap shows errors in Search Console. Now what?

The most common: the sitemap references URLs that 404, 301-redirect, or are blocked by robots.txt. Fix the source — remove the dead URLs from your site, or remove them from the sitemap — and resubmit. Don't ignore the warnings; they erode trust in your sitemap over time.

The bottom line

Generate a clean sitemap (only canonical, indexable, 200-OK URLs), reference it from robots.txt, and submit it once to Search Console. Keep it accurate when pages change. That's the entire workflow — and it's the single fastest way to make sure Google sees what you publish.

Generate a sitemap free →

Advertisement

See also edit

Comments (0) edit

No comments yet — be the first to share your thoughts.

Leave a comment

Comments are moderated and appear after review. Your email is never shown publicly or shared.