create sitemaps in the blink
This article is a translation of the following my article:
Original: サイトマップ(Sitemap)を秒速生成する
* Translated automatically by Google.
* Please note that some links or referenced content in this article may be in Japanese.
* Comments in the code are basically in Japanese.
by bokumin
create sitemaps in the blink
I’ve been creating sitemaps using Node.js and Python tools, but crawling over the network takes time and redirect errors occur frequently… I thought it would be faster to create it myself, but it was a pain and I kept putting it off, but I finally created it on my day off last week. I made it simple so that it runs locally and can be executed using only system standard tools (mariadb, grep, sed).
*Since the blog section of my site is created using WordPress, the database information is obtained from wp-config.php and the article data is accessed. If you are using a CMS other than WordPress, please modify the script to include the database login information directly.
cat make-sitemap.sh
#!/bin/bash
OUTPUT_FILE="/srv/www/htdocs/sitemap.xml"
#OUTPUT_FILE="test_sitemap.xml"
WEBROOT="/srv/www/htdocs"
DOMAIN="https://bokumin.org"
WP_CONFIG="/srv/www/htdocs/blog/wp-config.php"
DB_NAME=$(grep "DB_NAME" "$WP_CONFIG" | cut -d "'" -f 4)
DB_USER=$(grep "DB_USER" "$WP_CONFIG" | cut -d "'" -f 4)
DB_PASSWORD=$(grep "DB_PASSWORD" "$WP_CONFIG" | cut -d "'" -f 4)
DB_HOST=$(grep "DB_HOST" "$WP_CONFIG" | cut -d "'" -f 4)
TABLE_PREFIX=$(grep '$table_prefix' "$WP_CONFIG" | cut -d "'" -f 2)
cat > "$OUTPUT_FILE" << 'EOF'
<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9
http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">
EOF
process_url() {
local path="$1"
local full_path="$WEBROOT$path"
if [[ -f "$full_path" ]]; then
lastmod=$(date -r "$full_path" +%Y-%m-%d)
elif [[ -d "$full_path" ]]; then
lastmod=$(date -r "$full_path" +%Y-%m-%d)
else
return
fi
cat >> "$OUTPUT_FILE" << EOF
<url>
<loc>$DOMAIN$path</loc>
<lastmod>$lastmod</lastmod>
</url>
EOF
}
process_url "/"
process_url "/art-works/"
process_url "/gpg-public-key.txt"
process_url "/others/"
process_url "/spam-check/"
process_url "/amedas-dashboard/"
process_url "/blog/"
mariadb -h "$DB_HOST" -u "$DB_USER" -p"$DB_PASSWORD" -D "$DB_NAME" -N -e "
SELECT
CONCAT('/blog/', DATE_FORMAT(post_date, '%Y/%m/%d'), '/', post_name, '/'),
DATE_FORMAT(post_date, '%Y-%m-%d')
FROM ${TABLE_PREFIX}posts
WHERE post_status = 'publish'
AND post_type = 'post'
ORDER BY post_date DESC
" | while IFS=$'\t' read -r url lastmod; do
cat >> "$OUTPUT_FILE" << EOF
<url>
<loc>$DOMAIN$url</loc>
<lastmod>$lastmod</lastmod>
</url>
EOF
done
echo "</urlset>" >> "$OUTPUT_FILE"
chown wwwrun:wwwrun "$OUTPUT_FILE"
The execution time is as follows. It’s really fast.
time ./make-sitemap.sh
real 0m0.258s
user 0m0.149s
sys 0m0.114s
Here is the node.js version that I used before, but you can see that it takes a lot of time to process because there is network access.
$ cat make-sitemap-past.sh
#!/bin/bash
/usr/local/bin/sitemap-generator -f sitemap.xml https://bokumin.org
$ time ./make-sitemap-past.sh
(node:458025) Warning: Setting the NODE_TLS_REJECT_UNAUTHORIZED environment variable to '0' makes TLS connections and HTTPS requests insecure by disabling certificate verification.
(Use `node22 --trace-warnings ...` to show where the warning was created)
real 0m48.391s
user 0m9.989s
sys 0m0.225s
Handling static pages other than WordPress
My site originally published static pages using Apache, but I later introduced WordPress. Therefore, the configuration is such that static pages are directly under the root directory and WordPress is under /blog.
Pages other than WordPress that you want to include in the sitemap can be specified manually. I think it’s more efficient to specify what you want to include rather than excluding what you don’t want to see, so that’s how I created the script.
process_url "/"
process_url "/art-works/"
process_url "/gpg-public-key.txt"
process_url "/others/"
process_url "/spam-check/"
process_url "/amedas-dashboard/"
process_url "/blog/"
This time, we did not include Priority or other attributes, and simply retrieved the URL and last updated date. Google’s bot basically only sees this information, so I don’t think there is any need to configure it in detail.
The date of the static site is obtained using the process_url function in the script.
process_url() {
local path="$1"
local full_path="$WEBROOT$path"
if [[ -f "$full_path" ]]; then
lastmod=$(date -r "$full_path" +%Y-%m-%d) # ファイルの更新日時
elif [[ -d "$full_path" ]]; then
lastmod=$(date -r "$full_path" +%Y-%m-%d) # ディレクトリの更新日時
else
return
fi
...
}
Output result (site map)
Let’s check the output sitemap. The output is as expected.
cat /srv/www/htdocs/sitemap.xml
<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9
http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">
<url>
<loc>https://bokumin.org/</loc>
<lastmod>2025-10-31</lastmod>
</url>
<url>
<loc>https://bokumin.org/art-works/</loc>
<lastmod>2025-09-11</lastmod>
</url>
<url>
<loc>https://bokumin.org/gpg-public-key.txt</loc>
<lastmod>2024-10-19</lastmod>
</url>
<url>
<loc>https://bokumin.org/others/</loc>
<lastmod>2025-10-30</lastmod>
</url>
<url>
<loc>https://bokumin.org/spam-check/</loc>
<lastmod>2025-10-03</lastmod>
</url>
<url>
<loc>https://bokumin.org/amedas-dashboard/</loc>
<lastmod>2025-10-03</lastmod>
</url>
<url>
<loc>https://bokumin.org/blog/</loc>
<lastmod>2025-10-29</lastmod>
</url>
<url>
<loc>https://bokumin.org/blog/2025/10/28/btrfs%e3%81%a8opensuse%e3%81%a7raid%e3%82%92%e6%a7%8b%e7%af%89%e3%81%99%e3%82%8braid1-raid0-raid10/</loc>
<lastmod>2025-10-28</lastmod>
*You can check the actual sitemap created from the page below.
https://bokumin.org/sitemap.xml
The script we created this time runs completely locally and does not require any external tools. I think the good thing about it is that it can be executed only with mariadb, grep, and sed, which are installed as standard on the system. The processing speed is fast and there are zero dependencies, so there is no need to worry about package management or updates. Since article information is obtained directly from the WordPress database, it is more accurate than methods that use crawlers, and the load on the server is kept to a minimum.
It is lightweight enough that it can be run in a short time using cron, so you can update the sitemap immediately after a new post is made.
This script can be used by anyone with a little modification of the domain name, path, database information, etc. I hope this will be helpful to those who are thinking of creating a sitemap for their personal website.
End