Migrating my blog to Astro - Sitemap
Adding support for a sitemap to your site with Astro
This is part of a series of posts on how I migrated my blog to Astro
Having a sitemap is a great way to ensure that search engines can quickly find all the content on your website, and also a way of indicating to them which pages have been updated and may need re-indexing.
Astro has a sitemap package @astrojs/sitemap that you can use to generate sitemap files for your site.
Configuration of the sitemap is done by modifying the astro.config.ts
file.
In my case, I wanted to be able to set the lastMod
property for each blog post entry based on the Git “last commit” date. This is a little slow to evaluate as it need to do lots of git queries for every blog post file, but that’s the best way I could think of doing it.
The only other alternative would be to add a ‘lastModified’ frontmatter property to every page, and then remember to update that value every time you modified the file. That just didn’t seem feasible.
I’d be interested to hear if there are any other ways to achieve this, especially if they are more efficient.
// @ts-check
import { defineConfig } from 'astro/config';
import { existsSync, globSync } from 'node:fs';
import { execSync } from 'child_process';
import { join, resolve } from 'path';
import sitemap, { type SitemapItem } from "@astrojs/sitemap";
import mdx from "@astrojs/mdx";
// https://astro.build/config
export default defineConfig({
compressHTML: false,
site: process.env.DEPLOY_PRIME_URL || "https://david.gardiner.net.au",
integrations: [sitemap({
serialize(item) {
// ensure we have no trailing slash for files
item.url = item.url.replace(/\/$/, '');
try {
// if item is a post, get last modified date from Git
const urlPattern = /https:\/\/.*?\/(\d{4})\/(\d{2})\/(.+)/;
const match = item.url.match(urlPattern);
if (match && match[1] && match[2] && match[3]) {
const year = match[1];
const month = match[2];
const slug = match[3];
// Create a glob pattern for the file
const filePattern = `${year}-${month}-*-${slug}.md`;
const postsDir = resolve(process.cwd(), 'src', 'posts', year);
try {
// First check if the directory exists
if (existsSync(postsDir)) {
// Use Node's built-in fs.globSync to find files matching the pattern
const files = globSync(filePattern, { cwd: postsDir });
if (files.length > 0 && files[0]) {
const filePath = join(postsDir, files[0]);
updateLastModifiedFromGit(filePath, item);
}
}
} catch (err) {
// Handle errors without crashing
console.error(`Error finding file for ${item.url}: ${err instanceof Error ? err.message : String(err)}`);
}
} // or specific root-level pages /about, /speaking
else if (item.url.match(/\/about$/)) {
const filePath = join(process.cwd(), 'src', 'pages', 'about.astro');
updateLastModifiedFromGit(filePath, item);
}
else if (item.url.match(/\/speaking$/)) {
const filePath = join(process.cwd(), 'src', 'pages', 'speaking.md');
updateLastModifiedFromGit(filePath, item);
}
} catch (error) {
// Catch any errors to prevent build failures
console.error(`Error processing sitemap item ${item.url}: ${error instanceof Error ? error.message : String(error)}`);
}
return item;
}
}), mdx()],
experimental: {
},
build: {
format: 'file',
}
});
function updateLastModifiedFromGit(filePath: string, item: SitemapItem) {
if (existsSync(filePath)) {
// Get last modified date from git
const gitCmd = `git log -1 --pretty="format:%cI" "${filePath}"`;
const lastModified = execSync(gitCmd, { encoding: 'utf8' }).trim();
if (lastModified) {
item.lastmod = new Date(lastModified).toISOString();
}
}
}
Here’s an example of the output it generates:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:news="http://www.google.com/schemas/sitemap-news/0.9"
xmlns:xhtml="http://www.w3.org/1999/xhtml"
xmlns:image="http://www.google.com/schemas/sitemap-image/1.1"
xmlns:video="http://www.google.com/schemas/sitemap-video/1.1">
<url>
<loc>https://david.gardiner.net.au</loc>
</url>
<url>
<loc>https://david.gardiner.net.au/2005</loc>
</url>
<url>
<loc>https://david.gardiner.net.au/2005/10/simulating-workplace</loc>
<lastmod>2025-05-11T09:52:45.000Z</lastmod>
</url>
<url>
<loc>https://david.gardiner.net.au/2024</loc>
</url>
<url>
<loc>https://david.gardiner.net.au/2024/01/cfs-azure-function</loc>
<lastmod>2025-05-31T05:03:37.000Z</lastmod>
</url>
<url>
<loc>https://david.gardiner.net.au/2024/01/dotnet8-source-link</loc>
<lastmod>2025-04-23T06:57:26.000Z</lastmod>
</url>
I’d really like to be able to set the lastmod
property for the other pages on the site (like the home page and the year and tag pages). I think that should be possible by inferring the latest date from all the pages that those pages link to, but I haven’t implemented that yet.