bokumin.org

Github

create sitemaps in the blink

This article is a translation of the following my article:

 

 

* Translated automatically by Google.
* Please note that some links or referenced content in this article may be in Japanese.
* Comments in the code are basically in Japanese.

 

by bokumin

 

create sitemaps in the blink

 


I’ve been creating sitemaps using Node.js and Python tools, but crawling over the network takes time and redirect errors occur frequently… I thought it would be faster to create it myself, but it was a pain and I kept putting it off, but I finally created it on my day off last week. I made it simple so that it runs locally and can be executed using only system standard tools (mariadb, grep, sed).

 

 

cat make-sitemap.sh 
#!/bin/bash

OUTPUT_FILE="/srv/www/htdocs/sitemap.xml"
#OUTPUT_FILE="test_sitemap.xml"
WEBROOT="/srv/www/htdocs"
DOMAIN="https://bokumin.org"
WP_CONFIG="/srv/www/htdocs/blog/wp-config.php"

DB_NAME=$(grep "DB_NAME" "$WP_CONFIG" | cut -d "'" -f 4)
DB_USER=$(grep "DB_USER" "$WP_CONFIG" | cut -d "'" -f 4)
DB_PASSWORD=$(grep "DB_PASSWORD" "$WP_CONFIG" | cut -d "'" -f 4)
DB_HOST=$(grep "DB_HOST" "$WP_CONFIG" | cut -d "'" -f 4)
TABLE_PREFIX=$(grep '$table_prefix' "$WP_CONFIG" | cut -d "'" -f 2)

cat > "$OUTPUT_FILE" << 'EOF'
<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9
        http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">
EOF

process_url() {
    local path="$1"
    local full_path="$WEBROOT$path"
    
    if [[ -f "$full_path" ]]; then
        lastmod=$(date -r "$full_path" +%Y-%m-%d)
    elif [[ -d "$full_path" ]]; then
        lastmod=$(date -r "$full_path" +%Y-%m-%d)
    else
        return
    fi
    
    cat >> "$OUTPUT_FILE" << EOF
  <url>
    <loc>$DOMAIN$path</loc>
    <lastmod>$lastmod</lastmod>
  </url>
EOF
}

process_url "/"
process_url "/art-works/"
process_url "/gpg-public-key.txt"
process_url "/others/"
process_url "/spam-check/"
process_url "/amedas-dashboard/"
process_url "/blog/"

mariadb -h "$DB_HOST" -u "$DB_USER" -p"$DB_PASSWORD" -D "$DB_NAME" -N -e "
SELECT 
    CONCAT('/blog/', DATE_FORMAT(post_date, '%Y/%m/%d'), '/', post_name, '/'),
    DATE_FORMAT(post_date, '%Y-%m-%d')
FROM ${TABLE_PREFIX}posts 
WHERE post_status = 'publish' 
AND post_type = 'post'
ORDER BY post_date DESC
" | while IFS=$'\t' read -r url lastmod; do
    cat >> "$OUTPUT_FILE" << EOF
  <url>
    <loc>$DOMAIN$url</loc>
    <lastmod>$lastmod</lastmod>
  </url>
EOF
done

echo "</urlset>" >> "$OUTPUT_FILE"
chown wwwrun:wwwrun "$OUTPUT_FILE"

 

The execution time is as follows. It’s really fast.

 

time ./make-sitemap.sh

real    0m0.258s
user    0m0.149s
sys     0m0.114s

 

Here is the node.js version that I used before, but you can see that it takes a lot of time to process because there is network access.

 

$ cat make-sitemap-past.sh 
#!/bin/bash
/usr/local/bin/sitemap-generator -f sitemap.xml https://bokumin.org



$ time ./make-sitemap-past.sh 
(node:458025) Warning: Setting the NODE_TLS_REJECT_UNAUTHORIZED environment variable to '0' makes TLS connections and HTTPS requests insecure by disabling certificate verification.
(Use `node22 --trace-warnings ...` to show where the warning was created)

real    0m48.391s
user    0m9.989s
sys     0m0.225s

 

Handling static pages other than WordPress

 

My site originally published static pages using Apache, but I later introduced WordPress. Therefore, the configuration is such that static pages are directly under the root directory and WordPress is under /blog.
Pages other than WordPress that you want to include in the sitemap can be specified manually. I think it’s more efficient to specify what you want to include rather than excluding what you don’t want to see, so that’s how I created the script.

 

process_url "/"
process_url "/art-works/"
process_url "/gpg-public-key.txt"
process_url "/others/"
process_url "/spam-check/"
process_url "/amedas-dashboard/"
process_url "/blog/"

 

This time, we did not include Priority or other attributes, and simply retrieved the URL and last updated date. Google’s bot basically only sees this information, so I don’t think there is any need to configure it in detail.
The date of the static site is obtained using the process_url function in the script.

 

process_url() {
    local path="$1"
    local full_path="$WEBROOT$path"
    
    if [[ -f "$full_path" ]]; then
        lastmod=$(date -r "$full_path" +%Y-%m-%d)  # ファイルの更新日時
    elif [[ -d "$full_path" ]]; then
        lastmod=$(date -r "$full_path" +%Y-%m-%d)  # ディレクトリの更新日時
    else
        return
    fi
    ...
}

 

Output result (site map)

 

Let’s check the output sitemap. The output is as expected.

 

 cat /srv/www/htdocs/sitemap.xml 
<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9
        http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">
  <url>
    <loc>https://bokumin.org/</loc>
    <lastmod>2025-10-31</lastmod>
  </url>
  <url>
    <loc>https://bokumin.org/art-works/</loc>
    <lastmod>2025-09-11</lastmod>
  </url>
  <url>
    <loc>https://bokumin.org/gpg-public-key.txt</loc>
    <lastmod>2024-10-19</lastmod>
  </url>
  <url>
    <loc>https://bokumin.org/others/</loc>
    <lastmod>2025-10-30</lastmod>
  </url>
  <url>
    <loc>https://bokumin.org/spam-check/</loc>
    <lastmod>2025-10-03</lastmod>
  </url>
  <url>
    <loc>https://bokumin.org/amedas-dashboard/</loc>
    <lastmod>2025-10-03</lastmod>
  </url>
  <url>
    <loc>https://bokumin.org/blog/</loc>
    <lastmod>2025-10-29</lastmod>
  </url>
  <url>
    <loc>https://bokumin.org/blog/2025/10/28/btrfs%e3%81%a8opensuse%e3%81%a7raid%e3%82%92%e6%a7%8b%e7%af%89%e3%81%99%e3%82%8braid1-raid0-raid10/</loc>
    <lastmod>2025-10-28</lastmod>

 

*You can check the actual sitemap created from the page below.
https://bokumin.org/sitemap.xml

 

 

The script we created this time runs completely locally and does not require any external tools. I think the good thing about it is that it can be executed only with mariadb, grep, and sed, which are installed as standard on the system. The processing speed is fast and there are zero dependencies, so there is no need to worry about package management or updates. Since article information is obtained directly from the WordPress database, it is more accurate than methods that use crawlers, and the load on the server is kept to a minimum.
It is lightweight enough that it can be run in a short time using cron, so you can update the sitemap immediately after a new post is made.
This script can be used by anyone with a little modification of the domain name, path, database information, etc. I hope this will be helpful to those who are thinking of creating a sitemap for their personal website.

 

End