Skip to content

Enable Chinese language locales and UTF-8 with Yocto Project

  • by

When building minimal systems, image size is important, and a practical approach is to keep only what is needed to operate the system in limited constraints. Among many approaches, like turning off unneeded features in packages, one technique is to eliminate dead data e.g. all kinds of language data or character encodings if it will only be deployed in a subset of geography or may not need to interact with the user. Embedded Linux build systems like OpenEmbedded/Yocto are good for starting with a bare minimum system and building it from the ground up. This is how these build systems differentiate from desktop distributions like Debian, Fedora and their derivatives. However, the users need to know various knobs to enable/disable features that may be required for their system. One such use case is to enable locale support which is the purpose of this article.

Yocto has a granular way to build and package locale and language support on system images where every locale and character encoding is packaged individually into an output package. Since the targets could be tiny embedded systems with limited processing power, it’s not always optimal or even possible to build the localedata on the device itself which is the most common technique used with desktop distributions. Yocto builds have three different ways to provide this into system images which is controlled via GLIBC_INTERNAL_USE_BINARY_LOCALE variable

“compile” – Use QEMU to generate the binary locale files
“precompiled” – The binary locale files are pregenerated and already present
“ondevice” – The device will build the locale files upon first boot through the postinst scripts

Similarly there are other knobs such as ENABLE_BINARY_LOCALE_GENERATION which turns on the locale generation functionality in build system,

LOCALE_GENERATION_WITH_CROSS-LOCALEDEF uses the localedef utility to generate this data which is faster than generating with user-mode QEMU during build. The Yocto project maintains a cross port of the localedef utility.

IMAGE_LINGUAS is a space separated list of locale names to enable in the images. It will build all the possible locales but package only these selected locales into image. This gives finer control over selecting a subset that is needed for the device. Therefore below are distro policy settings to enable Locale generation and package Chinese language data into the final image.

ENABLE_BINARY_LOCALE_GENERATION = "1"
LOCALE_GENERATION_WITH_CROSS-LOCALEDEF = "1"
GLIBC_INTERNAL_USE_BINARY_LOCALE="compile"
IMAGE_LINGUAS = "zh-hk zh-cn"

This may not be however enough, since filesystems also need to be able to process UTF-8 data and users might have minimized the kernel default configuration and disabled this support explicitly in Kconfig. Therefore it’s important to ensure that the kernel does have the Native Language support enabled. Enable following in kernel config

CONFIG_FAT_DEFAULT_CODEPAGE=936
CONFIG_FAT_DEFAULT_IOCHARSET="utf8"
CONFIG_FAT_DEFAULT_UTF8=y
CONFIG_NLS_CODEPAGE_936=m
CONFIG_NLS_UTF8=m

A lot of Embedded Linux systems use busybox to provide various unix utilities and shells since its small size and applet design make it such that there are no fork/exec happening when an utility is called via shell commands. Therefore it becomes quite useful in small systems from both runtime and size perspective. Busybox uses the kernel kconfig mechanism to enable/disable features and utilities so that the users can configure it to provide absolutely what they need, no more, no less. This would mean that users have to ensure they have enabled the right set of kconfig options to let busybox support locale and UTF-8. Therefore for busybox enable

CONFIG_LOCALE_SUPPORT=y
CONFIG_UNICODE_USING_LOCALE=y

Especially for enabling Chinese language support additional patches are required for busybox. They need to be applied as well. Here is a bug report from OpenWRT and a patch proposed for fixing the same.

https://dev.archive.openwrt.org/changeset/43084.html

However, this below is a more complete patch which works well with busybox provided with the latest Yocto as of writing.

diff -ruPN a/libbb/printable_string.c b/libbb/printable_string.c
--- a/libbb/printable_string.c  2023-03-21 20:48:27.525118550 +0000
+++ b/libbb/printable_string.c  2023-03-21 20:50:50.371957719 +0000
@@ -28,8 +28,6 @@
                }
                if (c < ' ')
                        break;
-               if (c >= 0x7f)
-                       break;
                s++;
        }

@@ -42,7 +40,8 @@
                        unsigned char c = *d;
                        if (c == '\0')
                                break;
-                       if (c < ' ' || c >= 0x7f)
+                       /* Enable Chinese Chars */
+                       if (c < ' ' )
                                *d = '?';
                        d++;
                }
diff -ruPN a/libbb/unicode.c b/libbb/unicode.c
--- a/libbb/unicode.c   2023-03-21 20:48:43.616086536 +0000
+++ b/libbb/unicode.c   2023-03-21 20:51:58.331600017 +0000
@@ -1019,7 +1019,8 @@
                                        while ((int)--width >= 0);
                                        break;
                                }
-                               *d++ = (c >= ' ' && c < 0x7f) ? c : '?';
+                               /*Enable Chinese Char*/
+                               *d++ = (c >= ' ') ? c : '?';
                                src++;
                        }
                        *d = '\0';
@@ -1027,7 +1028,8 @@
                        d = dst = xstrndup(src, width);
                        while (*d) {
                                unsigned char c = *d;
-                               if (c < ' ' || c >= 0x7f)
+                               /*Enable Chinese Char*/
+                               if (c < ' ')
                                        *d = '?';
                                d++;
                        }

This will help enable basic support for UTF-8 characters and Chinese language, however if there are more dependencies in the image which need to process the Native Language support, it might be good to look into that and enable it. E.g. –enable-nls is a commonly used option for packages using autotools .e.g. gnuTLS, libidl2, Yocto build system provide a knob USE_NLS.
This takes “yes” or “no” and usually it is set to “yes” when building GLIBC based targets. For MUSL based targets it is set to “no” by default.

Summary

There are various settings needed to support UTF-8 and locales using the Yocto project. This uses Glibc as default C library for the target; however since the Yocto project does support Musl as an alternative C library, these instructions might not work as is for musl based systems.

Leave a Reply

Your email address will not be published. Required fields are marked *