Handling Japanese characters in PHP

We know we have manipulate strings, characters in PHP using various function depending on what we need. For example – cutting strings, counting strings length, replacing strings etc. We can directly use built-in PHP function like substr, str_replace, str_length etc. But we can’t directly use these functions for Japanese characters, Why ?

Everyone knows that a “bit” is 0 or 1, nothing else, and a “byte” is a group of eight consecutive bits. Since one byte has eight of these dual value points, the byte can consist of a total of 256 different patterns (2 power 8). Different characters can be associated with each possible 8-bit pattern.

It is working fine as long as the language characters can be represented by 256 or less.

But what if you can’t represent a language with just 256 characters? Obviously Japanese characters need more than that. Nowadays , 256 characters isn’t enough anywhere. Fortunately, the new super character sets use anywhere from 1 to 4 bytes to define characters. Unicode, a scheme that uses multiple bytes to represent characters. There are several version of it like UTF-32, 26 8.

Unicode (including UTF-8) uses multiple byte configurations to represent characters. UTF-8 uses 1 to 4 bytes to generate 1,112,064 patterns that represent different characters.

We can’t still directly use string related functions by declaring UTF-8. PHP isn’t really designed to handle multibyte characters, so using standard string functions to handle these characters can have uncertain results. If you need to handle these multibyte characters, you need to use a special set of functions, the mbstring function. Use the --enable-mbstring compile-time option to enable the mb function and set the run-time configuration option mbstring-encoding_translation.

The next thing is HTTP header might covers the communication also contains the character set ID, so we need to declare the header also like this.

mb_internal_encoding("UTF-8");

Finally we can use mb string related function instead of directly using string function.

For example we can use mb_strlen instead of strlen .

You can see various mb functions here.

That’s all for today.

Yuuma



アプリ関連ニュース

お問い合わせはこちら

お問い合わせ・ご相談はお電話、またはお問い合わせフォームよりお受け付けいたしております。

tel. 06-6454-8833(平日 10:00~17:00)

お問い合わせフォーム