Migrating my blog to Astro - Sitemap

Adding support for a sitemap to your site with Astro

Blogging

This is part of a series of posts on how I migrated my blog to Astro

Astro logo

Having a sitemap is a great way to ensure that search engines can quickly find all the content on your website, and also a way of indicating to them which pages have been updated and may need re-indexing.

Astro has a sitemap package @astrojs/sitemap that you can use to generate sitemap files for your site.

Configuration of the sitemap is done by modifying the astro.config.ts file.

In my case, I wanted to be able to set the lastMod property for each blog post entry based on the Git “last commit” date. This is a little slow to evaluate as it need to do lots of git queries for every blog post file, but that’s the best way I could think of doing it.

The only other alternative would be to add a ‘lastModified’ frontmatter property to every page, and then remember to update that value every time you modified the file. That just didn’t seem feasible.

I’d be interested to hear if there are any other ways to achieve this, especially if they are more efficient.

// @ts-check
import { defineConfig } from 'astro/config';
import { existsSync, globSync } from 'node:fs';
import { execSync } from 'child_process';
import { join, resolve } from 'path';

import sitemap, { type SitemapItem } from "@astrojs/sitemap";

import mdx from "@astrojs/mdx";

// https://astro.build/config
export default defineConfig({
  compressHTML: false,
  site: process.env.DEPLOY_PRIME_URL || "https://david.gardiner.net.au",
  integrations: [sitemap({
    serialize(item) {

      // ensure we have no trailing slash for files
      item.url = item.url.replace(/\/$/, '');

      try {
        // if item is a post, get last modified date from Git
        const urlPattern = /https:\/\/.*?\/(\d{4})\/(\d{2})\/(.+)/;
        const match = item.url.match(urlPattern);
        
        if (match && match[1] && match[2] && match[3]) {
          const year = match[1];
          const month = match[2];
          const slug = match[3];
          
          // Create a glob pattern for the file
          const filePattern = `${year}-${month}-*-${slug}.md`;
          const postsDir = resolve(process.cwd(), 'src', 'posts', year);
          
          try {
            // First check if the directory exists
            if (existsSync(postsDir)) {
              // Use Node's built-in fs.globSync to find files matching the pattern
              const files = globSync(filePattern, { cwd: postsDir });
              
              if (files.length > 0 && files[0]) {
                const filePath = join(postsDir, files[0]);
                
                updateLastModifiedFromGit(filePath, item);
              }
            }
          } catch (err) {
            // Handle errors without crashing
            console.error(`Error finding file for ${item.url}: ${err instanceof Error ? err.message : String(err)}`);
          }
        } // or specific root-level pages /about, /speaking
        else if (item.url.match(/\/about$/)) {
          const filePath = join(process.cwd(), 'src', 'pages', 'about.astro');
          updateLastModifiedFromGit(filePath, item);
        }
        else if (item.url.match(/\/speaking$/)) {
          const filePath = join(process.cwd(), 'src', 'pages', 'speaking.md');
          updateLastModifiedFromGit(filePath, item);
        }
      } catch (error) {
        // Catch any errors to prevent build failures
        console.error(`Error processing sitemap item ${item.url}: ${error instanceof Error ? error.message : String(error)}`);
      }
      
      return item;
    }
  }), mdx()],
  experimental: {
  },
  build: {
    format: 'file',
  }
});

function updateLastModifiedFromGit(filePath: string, item: SitemapItem) {
  if (existsSync(filePath)) {
    // Get last modified date from git
    const gitCmd = `git log -1 --pretty="format:%cI" "${filePath}"`;
    const lastModified = execSync(gitCmd, { encoding: 'utf8' }).trim();

    if (lastModified) {
      item.lastmod = new Date(lastModified).toISOString();
    }
  }
}

Here’s an example of the output it generates:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
    xmlns:news="http://www.google.com/schemas/sitemap-news/0.9"
    xmlns:xhtml="http://www.w3.org/1999/xhtml"
    xmlns:image="http://www.google.com/schemas/sitemap-image/1.1"
    xmlns:video="http://www.google.com/schemas/sitemap-video/1.1">
    <url>
        <loc>https://david.gardiner.net.au</loc>
    </url>
    <url>
        <loc>https://david.gardiner.net.au/2005</loc>
    </url>
    <url>
        <loc>https://david.gardiner.net.au/2005/10/simulating-workplace</loc>
        <lastmod>2025-05-11T09:52:45.000Z</lastmod>
    </url>
    <url>
        <loc>https://david.gardiner.net.au/2024</loc>
    </url>
    <url>
        <loc>https://david.gardiner.net.au/2024/01/cfs-azure-function</loc>
        <lastmod>2025-05-31T05:03:37.000Z</lastmod>
    </url>
    <url>
        <loc>https://david.gardiner.net.au/2024/01/dotnet8-source-link</loc>
        <lastmod>2025-04-23T06:57:26.000Z</lastmod>
    </url>

I’d really like to be able to set the lastmod property for the other pages on the site (like the home page and the year and tag pages). I think that should be possible by inferring the latest date from all the pages that those pages link to, but I haven’t implemented that yet.