Yesterday I’ve ported a PHP website to use Gettext for localizations (l10n). After reading through the Gettext documentation and going through the documentation in the PHP site, I’ve manged to get everything working (almost). I had one problem, all the non-ASCII characters (accented Latin chars, Japanese and Chinese) where displayed as question marks (?) instead of the correct form. This happened despite me using UTF-8 encoded files.
While some people (e.g. this one) suggested that it’s not possible to use non-ASCII characters when using a UTF-8 encoded message files, there is a solution and it’s quiet simple one. All you have to do is to call
bind_textdomain_codset and pass it