Sunday, June 12, 2011

Web application i18n programming

How to do i18n programming is a big topic. I do not think I should try to describe it in a short blog post. I post this because I was asked with this question.

In my recent project, I created a JavaScript function as shown below,
/*
 * Translates $text
 */
function __i18n($text)
{
    //extend to each module has its own language file.
    // If we don't have the language set.
    if (undefined === Module.namespace.UI.lang) {
        return $text;
    }
    // If we don't have a translation for the $text.
    if (undefined === Module.namespace.UI.lang[$text]) {
        return $text;
    }
    return Module.namespace.UI.lang[$text];
}
The above simple JavaScript code is used to translate text into another UTF-8 encoded text, which is most likely written in another language. BTW, one thing I want to emphasize is that UTF-8 and unicode are different stuff. I often heard people just use them as if they were exchangeable.

Above sample is just small part in the whole i18n enabled system. We can use a traditional three-tie Web application as sample (traditional desktop app has different concern, I believe). Let's assume that we have just one centralized Web server. That is, we do not consider the situation in which we have distributed Web server located in different host server and those server have different system locale settings. We will have concern about i18 in each layers including presentation layer, business logical layer, and database layer. Let's have a brief review on each layer as below,

  1. Presentation layer. First of all, let's know what is involved in this layer. Obviously, JavaScript, HTML is involved in this layer. But, we should not forgot that Web browser and Web server are included as well.
    In HTML, we can specify language info not only in HTML page wide, but also in each FORM scope.
    In JavaScript, as we see in the sample code, a language resource file can be well designed and used.
    In CSS, we probably need to create seperated CSS files for different locale as different language may have different text width.
    About Web browser and Web server, some people just ignore this part. But, in fact, in HTTP header specification, some field like Accept-Encoding, Accept-Language has been defined for your convenience. But, using this feature also cause potential problem if end user happens to change their browser settings.
  2. Business layer. Most of talking about resource bundle, locale object happens in this layer. According to my experience in Java, we only need to keep in mind that the internal encode used in JVM is UTF-8. That is, we need a base line. Then, from there, we convert to any encoded schema if the specific language is installed on the system.In terms of programming, coder need to be carefully to avoid use functions which only support ASCII characters. In language like PHP, C/C++, you can always find there are two set of string functions, one is only for ASCII characters. The other is used for "wider" unicode characters.
  3. Persistent storage layer. Using MySQL RDBMS as an example, database designer can specify storage schema and transmission schema for database or table. I always use UTF-8 so far. Of course, you can still support multilingual if you treat all string as just binary array.

A well designed i18n architecture is very important to a world wide used software as it may cause painful redesign and recoding effort later if it is not well designed at the very beginning. Also, it will be helpful to understand i18n programming by starting to learn i18n from a specific language. For example, IBM instead of Sun Microsystem contributes i18n into Core Java from the beginning. It will be good to use Java language to learn i18n.

No comments:

Post a Comment