All popular CMS softwares come with addons/plugins to automatically generate sitemaps for your website everytime you add new content. I recently added a few documentation sites which are all plain html files. I needed a tool to generate the sitemap files for these websites. After googling around a bit, I came across google’s own sitemap generation tool. It’s a very good tool and I highly recommend it for sitemap generation for html websites.

To use the tool, download the latest tarball to your server from this website. Version 1.5 is the latest version at the time of writing. Untar the tarball and optionally copy sitemap_gen.py to one of the directories in your path, for eg. $HOME/bin.

mkdir sitemap_gen; cd sitemap_gen
tar -zxvf $PATH_TO_SITEMAP_TARBALL/sitemap_gen_1.5.tar.gz
cp sitemap_gen.py $HOME/bin/
chmod +x $HOME/bin/sitemap_gen.py

Next step is to create the config file for the sitemap_gen tool. Lets say your domain name which has all the html files is www.humbug.in and it is physically located at /home/abcdef/humbug.in on your server, then the config file for sitemap_gen will look the XML file below. Lets call it config.xml

<?xml version="1.0" encoding="UTF-8"?>


Now to generate the sitemap, all you have to do is.

$PATH_TO_SITEMAP_EXECUTABLE/sitemap_gen.py --config=config.xml

This command will generate a sitemap.xml under /home/abcdef/humbug.in.

There are other configuration options which you can see on the sitemap gen website.

