Auto multilingual translation for WordPress blogs using Bogo and Linux commands
This article is a translation of the following my article:
Original: Bogo+LinuxコマンドでWordPressブログを自動&多言語翻訳する
* Translated automatically by Google.
* Please note that some links or referenced content in this article may be in Japanese.
* Comments in the code are basically in Japanese.
by bokumin
Auto multilingual translation for WordPress blogs using Bogo and Linux commands
Introduction
I have been thinking that it would be nice to have an English version of the blog article, and I have finally created one.
You can check from the URL below.
- English Top: https://bokumin.org/en/
- English blog: https://bokumin.org/blog/en/
However, it is too troublesome to manually translate and post each article, including past articles. So this time, I used the server’s command line tools to automate all of these tasks.
I have summarized how to implement a combination of the WordPress plugin “Bogo” and the Google Translate command, so I hope it will be helpful for those who want to multilingualize without spending a lot of money.
環境
- OS: openSUSE (Kernel 6.18.x)
- Web Server: Apache 2.4.66
- CMS: WordPress
- Required tools: wp-cli, trans, perl
*I think it can be done with any OS.
Introducing a translation engine (Google Translate CLI)
Install the command-line translation tool trans on the server.
# 取得+実行権限を付与
wget -O /usr/local/bin/trans git.io/trans
chmod +x /usr/local/bin/trans
# 動作確認(日本語→英語)
trans -b -no-auto :en "テストです"
It's a test
Introducing the WordPress plugin “Bogo”
We have adopted “Bogo” as the multilingual plugin. Unlike Polylang etc., Bogo has a simple structure where “Japanese articles” and “English articles” are created as independent posts.
Bogo can be installed as a WordPress plugin or as a command.
# Bogoのインストール
wp plugin install bogo --activate --allow-root
Creating automatic translation scripts
I found it tedious to translate and format existing articles, so I created a Bash script to automate the following process.
- Find Japanese articles that have not yet been translated.
- Protecting WordPress block definitions (JSON) and code blocks
- Translate the title and text into English using the trans command.
- Post English articles using WP-CLI.
- Use Bogo’s features to link Japanese and English articles as a “translation pair.”
The part I got hooked on
It has been discovered that simply passing the translation command breaks the data structure of WordPress block editors (Gutenberg), etc. The following two points were particularly troublesome.
- Plugins such as Code Block Pro:
I use Code Block Pro. For blocks that display source code, data is saved in JSON format. If the translation tool arbitrarily changes \n (line feed) or ” (quote) in JSON, the entire block will be corrupted and will not be displayed. - Backslash disappears:
The escape character (\) may disappear when passed through a shell script variable or echo. As a result, when posting an article, line feed codes etc. will be lost and the layout will be disrupted.
Honestly, it took me about half a day just to solve this problem…
This is because I fell into the multiple extinction trap where escapes disappear when passing from the shell to Perl, and disappear again when passing to WordPress.
In order to solve these problems, this script implements mask processing using Perl and utilizes the WP function wp_slash.
Script explanation
1. Preparation
First, we will configure the WordPress path, URL, and translation tools. We check Bogo’s _trid (translation group ID) and skip it if an English article already exists
#!/bin/bash
WP_PATH="/srv/www/htdocs/blog" # WordPressを置いているパス
SITE_URL="https://bokumin.org" # 自分のサイトURL
WP_CMD="wp --path=$WP_PATH --url=$SITE_URL --allow-root"
# Google翻訳コマンド (trans) の設定
TRANS_BIN=$(which trans)
# -b: 余計な出力をせず翻訳結果だけ表示
# -no-auto: 自動言語判定をオフにして高速化
TRANS_OPTS="-b -no-auto :en"
SLEEP_TIME=20
2. Content acquisition and protection (Perl script generation)
When retrieving the article text, special characters will be corrupted if you use shell variables, so write it directly to a file via PHP.
We also dynamically generated a Perl script to mask important parts before translation.
This allows blocks containing complex JSON data to be safely excluded from translation.
$content =~ s{(<!--\s*/?wp:.*?-->)}{ push(@masks, $1); "___MASK_WP_" . $#masks . "___" }gse;
3. Creating an English article
Register the translated text to WordPress.
The most important thing here is wp_slash(). WordPress’ post function wp_insert_post expects the data to have been escaped (slashed) already, and unslashes it on save.
For this reason, if you do not apply wp_slash() on the PHP side in advance, all backslashes (\n, \u003c, etc.) in the article will be erased and the article will be corrupted.
# WP-CLIの eval コマンドで PHPコードを直接実行
en_id=$($WP_CMD eval "
\$title = base64_decode('$title_b64');
\$content = file_get_contents('$content_file');
// wp_insert_postは保存時にバックスラッシュを取り除くため事前に wp_slash でエスケープしておく必要がある
\$content = wp_slash(\$content);
\$post_data = array(
'post_title' => \$title,
'post_content' => \$content,
// ...
);
\$id = wp_insert_post(\$post_data);
if (!is_wp_error(\$id)) { echo \$id; }
")
Script you created
*Please change the path and URL according to your environment
#!/bin/bash
# 設定エリア
WP_PATH="/srv/www/htdocs/blog" # WordPressを置いているパス
SITE_URL="https://bokumin.org" # 自分のサイトURL
# wp & transコマンド
WP_CMD="wp --path=$WP_PATH --url=$SITE_URL --allow-root"
TRANS_BIN=$(which trans)
TRANS_OPTS="-b -no-auto :en"
# スリープ秒数
SLEEP_TIME=20
# Perlスクリプト
PERL_SCRIPT="/tmp/wp_trans_mask.pl"
cat << 'EOF' > "$PERL_SCRIPT"
use strict;
use warnings;
use utf8;
use open ':std', ':encoding(UTF-8)';
my ($t_bin, $t_opts, $t_out) = @ARGV;
# 標準入力から読み込み
my $content = do { local $/; <STDIN> };
my @masks = ();
# 保護処理
# WordPressブロック (<!-- wp:... -->)
$content =~ s{(<!--\s*/?wp:.*?-->)}{ push(@masks, $1); "___MASK_WP_" . $#masks . "___" }gse;
# <script>
$content =~ s{(<script\b[^>]*>.*?</script>)}{ push(@masks, $1); "___MASK_SCRIPT_" . $#masks . "___" }gse;
# <style>
$content =~ s{(<style\b[^>]*>.*?</style>)}{ push(@masks, $1); "___MASK_STYLE_" . $#masks . "___" }gse;
# <pre>
$content =~ s{(<pre\b[^>]*>.*?</pre>)}{ push(@masks, $1); "___MASK_PRE_" . $#masks . "___" }gse;
# <code>
$content =~ s{(<code\b[^>]*>.*?</code>)}{ push(@masks, $1); "___MASK_CODE_" . $#masks . "___" }gse;
# URL
$content =~ s{(https?://[^\s"<]+)}{ push(@masks, $1); "___MASK_URL_" . $#masks . "___" }gse;
# 翻訳実行
open(my $ph, "|-", "$t_bin $t_opts > $t_out") or die "Cannot open pipe to trans";
binmode($ph, ":encoding(UTF-8)");
print $ph $content;
close $ph;
# 復元処理
if (-z "$t_out") { exit; }
open(my $fh, "<:encoding(UTF-8)", "$t_out") or die "Cannot open translated file";
my $trans_content = do { local $/; <$fh> };
close $fh;
# 翻訳機が前後にスペースを入れた場合の対策
$trans_content =~ s/___ ?MASK_WP_ ?(\d+) ?___/$masks[$1]/g;
$trans_content =~ s/___ ?MASK_SCRIPT_ ?(\d+) ?___/$masks[$1]/g;
$trans_content =~ s/___ ?MASK_STYLE_ ?(\d+) ?___/$masks[$1]/g;
$trans_content =~ s/___ ?MASK_PRE_ ?(\d+) ?___/$masks[$1]/g;
$trans_content =~ s/___ ?MASK_CODE_ ?(\d+) ?___/$masks[$1]/g;
$trans_content =~ s/___ ?MASK_URL_ ?(\d+) ?___/$masks[$1]/g;
print $trans_content;
EOF
# メイン処理関数
process_translation() {
local ja_id=$1
echo "========================================"
echo "Checking Post ID: $ja_id..."
# ロケール確認
current_locale=$($WP_CMD post meta get $ja_id _locale)
if [ "$current_locale" == "en_US" ]; then
echo "-> Skip: Already English."
return
fi
# 翻訳済み確認
trid=$($WP_CMD post meta get $ja_id _trid)
if [ -n "$trid" ]; then
existing_en_id=$($WP_CMD post list --post_type=post --format=ids --meta_query="[{\"key\":\"_trid\",\"value\":\"$trid\"},{\"key\":\"_locale\",\"value\":\"en_US\"}]")
if [ -n "$existing_en_id" ]; then
echo "-> Skip: Translation exists (ID: $existing_en_id)."
return
fi
fi
# コンテンツ取得 (PHP経由でファイルへ書き出し)
TEMP_SRC=$(mktemp)
TEMP_TRANS=$(mktemp)
$WP_CMD eval "file_put_contents('$TEMP_SRC', get_post_field('post_content', $ja_id));"
# タイトル取得
ja_title=$($WP_CMD post get $ja_id --field=post_title)
ja_author=$($WP_CMD post get $ja_id --field=post_author)
ja_date=$($WP_CMD post get $ja_id --field=post_date)
# 英語タイトル抽出
extracted_en_title=$(cat "$TEMP_SRC" | sed 's/<[^>]*>//g' | head -n 5 | grep -E "^[A-Za-z0-9 \+\-\.\!\?]+$" | head -n 1)
if [ -n "$extracted_en_title" ]; then
en_title="$extracted_en_title"
else
en_title=$(echo "$ja_title" | $TRANS_BIN $TRANS_OPTS)
sleep 2
fi
# 翻訳実行 (Perlスクリプト)
echo "Translating Content..."
perl "$PERL_SCRIPT" "$TRANS_BIN" "$TRANS_OPTS" "$TEMP_TRANS" < "$TEMP_SRC" > "$TEMP_TRANS.final"
if [ ! -s "$TEMP_TRANS.final" ]; then
echo "[ERROR] Translation failed. Skipping."
rm "$TEMP_SRC" "$TEMP_TRANS" "$TEMP_TRANS.final"
return
fi
# 記事作成 (日付同期 & wp_slash)
echo "Creating English Post..."
title_b64=$(echo -n "$en_title" | base64 | tr -d '\n')
content_file="$TEMP_TRANS.final"
en_id=$($WP_CMD eval "
\$title = base64_decode('$title_b64');
\$date = '$ja_date';
if (file_exists('$content_file')) {
\$content = file_get_contents('$content_file');
// wp_slashでエスケープ
\$content = wp_slash(\$content);
\$post_data = array(
'post_title' => \$title,
'post_content' => \$content,
'post_status' => 'publish',
'post_author' => $ja_author,
'post_date' => \$date, // 日付を指定
'post_type' => 'post'
);
\$id = wp_insert_post(\$post_data);
if (!is_wp_error(\$id)) { echo \$id; } else { echo 0; }
}
")
rm "$TEMP_SRC" "$TEMP_TRANS" "$TEMP_TRANS.final"
if [ -z "$en_id" ] || [ "$en_id" -eq 0 ]; then
echo "[ERROR] Failed to create post."
return
fi
echo "Created Post ID: $en_id (Date: $ja_date)"
# Bogo設定
$WP_CMD post meta update $ja_id _locale ja
$WP_CMD post meta update $en_id _locale en_US
if [ -z "$trid" ]; then
trid=$(date +%s)
$WP_CMD post meta update $ja_id _trid $trid
fi
$WP_CMD post meta update $en_id _trid $trid
# カテゴリ・タグ引き継ぎ
$WP_CMD eval "
\$ja_id = $ja_id;
\$en_id = $en_id;
\$taxonomies = array('category', 'post_tag');
foreach (\$taxonomies as \$tax) {
\$ja_term_ids = wp_get_object_terms(\$ja_id, \$tax, array('fields' => 'ids'));
if ( ! is_wp_error(\$ja_term_ids) && ! empty(\$ja_term_ids) ) {
wp_set_object_terms(\$en_id, \$ja_term_ids, \$tax);
}
}
"
echo "Done. Sleeping $SLEEP_TIME seconds..."
sleep $SLEEP_TIME
}
# ループ処理
ids_ja=$($WP_CMD post list --post_type=post --format=ids --posts_per_page=-1 --meta_key=_locale --meta_value=ja --orderby=date --order=ASC)
ids_none=$($WP_CMD post list --post_type=post --format=ids --posts_per_page=-1 --meta_query='[{"key":"_locale","compare":"NOT EXISTS"}]' --orderby=date --order=ASC)
ids=$(echo "$ids_ja $ids_none" | tr ' ' '\n' | sort -n)
# 最新の1記事のみでいい場合はコメントアウトを外す
#ids=$($WP_CMD post list --post_type=post --format=ids --posts_per_page=1 --meta_key=_locale --meta_value=ja --orderby=date --order=DESC)
if [ -z "$(echo $ids | tr -d ' ')" ]; then
echo "No target posts found."
exit 0
fi
for id in $ids; do
process_translation $id
done
rm -f "$PERL_SCRIPT"
To avoid the Google Translate API limitation (400 Bad Request), the script includes a 20 second sleep after each article. If there are many articles, it will take some time to run, but it will finish if you leave it alone.
If you want to get only the latest articles, you can just modify the last ids as follows.
ids=$($WP_CMD post list --post_type=post --format=ids --posts_per_page=1 --meta_key=_locale --meta_value=ja --orderby=date --order=DESC)
for id in $ids; do
process_translation $id
done
Bonus: How to make static pages outside of WordPress multilingual *Please skip if you are not interested
My site structure is a little special: the blog part (/blog/) is WordPress, but the top page (/) and tool storage (/others/) are running on homemade PHP pages that do not use WordPress.
Bogo plugin only manages within WordPress, so these pages need to be linked.
I could have created something like index_en.php, but it would be complicated to manage, so I rejected it and implemented a system that would allow me to change the display language depending on the URL while leaving only one file.
1. Server settings (.htaccess)
First, we will use Apache’s mod_rewrite to forward requests so that we can access the /en/ URL without creating a physical directory.
This way, even if /en/ is accessed, the root index.php will be processed inside the server.
RewriteEngine On
RewriteBase /
# トップページ (/en/ -> index.php)
RewriteRule ^en/?$ index.php [L]
# サブディレクトリ (/others/en/ -> others/index.php)
RewriteRule ^others/en/?$ others/index.php [L]
RewriteRule ^others/en/(.*)$ others/$1 [L]
# 特定のファイル (/en/app-list.php -> app-list.php)
RewriteRule ^en/(.*)$ $1 [L]
2. Language determination on the PHP side
Next, add logic to determine whether the current URL is in English or not on the PHP side (header.php, etc.).
Since WordPress functions cannot be used, check $_SERVER[‘REQUEST_URI’] directly.
<?php
// 現在のURLが '/en' または '/blog/en' で始まっているか判定
$is_english = false;
if (isset($_SERVER['REQUEST_URI']) && preg_match('#^(/blog)?/en(/|$)#', $_SERVER['REQUEST_URI'])) {
$is_english = true;
}
// 内部リンク用のプレフィックスを作成
// 英語なら '/en'、日本語なら '' (空文字)
$link_prefix = $is_english ? '/en' : '';
?>
3. Switching display (simple translation function)
I defined a simple helper function _t() to switch between Japanese and English. Using this will make your HTML cleaner.
<?php
// 簡易翻訳関数
function _t($ja, $en) {
global $is_english;
echo $is_english ? $en : $ja;
}
?>
<!-- 使用例 -->
<h2><?php _t('ようこそ', 'Welcome'); ?></h2>
<p>
<?php _t(
'このサイトは技術ブログです。',
'This site is a technical blog.'
); ?>
</p>
<!-- リンクも自動で英語用に -->
<a href="<?php echo $link_prefix; ?>/others">Others</a></code>
4. Hybrid generation of sitemap
Finally, there are SEO measures.
WordPress articles can be retrieved with wp-cli, but since these static pages are not in the DB, I used a shell script to combine the “manual list” and “WP automatic list” to generate sitemap.xml.
# 静的ページ (今日の日付で更新)
TODAY=$(date +%Y-%m-%d)
echo "https://bokumin.org/,$TODAY" >> sitemap.tmp
echo "https://bokumin.org/en/,$TODAY" >> sitemap.tmp
# WordPress記事 (WP-CLIで取得)
wp eval '...' >> sitemap.tmp
The full text of the shell script for sitemap is here.
cat make-sitemap.sh
#!/bin/bash
OUTPUT_FILE="/srv/www/htdocs/sitemap.xml"
WEBROOT="/srv/www/htdocs"
DOMAIN="https://bokumin.org"
WP_PATH="/srv/www/htdocs/blog"
WP_CMD="wp --path=$WP_PATH --url=$DOMAIN --allow-root"
cat > "$OUTPUT_FILE" << 'EOF'
<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9
http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">
EOF
add_url_entry() {
local url_path="$1"
local file_check_path="$2"
local full_path="$WEBROOT$file_check_path"
local lastmod=""
if [[ -f "$full_path" ]]; then
lastmod=$(date -r "$full_path" +%Y-%m-%d)
elif [[ -d "$full_path" ]]; then
lastmod=$(date -r "$full_path" +%Y-%m-%d)
else
return
fi
cat >> "$OUTPUT_FILE" << EOF
<url>
<loc>${DOMAIN}${url_path}</loc>
<lastmod>${lastmod}</lastmod>
</url>
EOF
}
add_url_entry "/" "/index.php"
add_url_entry "/en/" "/index.php"
add_url_entry "/others/" "/others/"
add_url_entry "/others/en/" "/others/"
add_url_entry "/art-works/" "/art-works/"
add_url_entry "/art-works/en/" "/art-works/"
add_url_entry "/gpg-public-key.txt" "/gpg-public-key.txt"
add_url_entry "/spam-check/" "/spam-check/"
add_url_entry "/amedas-dashboard/" "/amedas-dashboard/"
add_url_entry "/blog/" "/blog/"
$WP_CMD eval '
$posts = get_posts(array(
"post_type" => array("post", "page"),
"post_status" => "publish",
"numberposts" => -1,
));
foreach($posts as $post) {
$lastmod = get_the_modified_date("Y-m-d", $post->ID);
$url = get_permalink($post->ID);
echo $url . "\t" . $lastmod . "\n";
}
' | while IFS=$'\t' read -r url lastmod; do
if [ -n "$url" ]; then
cat >> "$OUTPUT_FILE" << EOF
<url>
<loc>$url</loc>
<lastmod>$lastmod</lastmod>
</url>
EOF
fi
done
echo "</urlset>" >> "$OUTPUT_FILE"
chown wwwrun:wwwrun "$OUTPUT_FILE"
Summary
Your blog has been made multilingual for free and automatically.
Technical articles have a lot of code blocks, so the translation tends to be broken, but by adding mask processing with a Perl script, I can translate it cleanly. Sometimes there are some parts that are not translated well, but you can fix them later ()
With Bogo’s plug-in function, it is easy to manage by filtering English articles and Japanese articles, so if there are any strange articles, it is easy to correct them manually.

I hope this will be helpful for those considering a similar configuration.
End