Cabron Connector
SourceForge.net Logo
Another Open-Source Implementation of PHP - Flash Remoting
Introduction
The Flash MX supports Unicode text encoding.
In the sections below you can read about how you can use this feature when working with CabronConnector.
If you want to read some documentation first, try
this page about Flash MX and Unicode, and
this one about the UTF-8 encoding.
Also the unicode.org site can help you in many cases.

Unicode
You probably know that an ASCII character is stored on a byte, so it's value can be between 0x00 - 0xFF. These are only 256 characters, but a lot of them are control and special characters, so there is a relative little number of characters that can be used for texts.
Unicode characters starts from 0x00 and there is a huge number of them. You can see the charts on this page. Some examples:

font image Unicode Unicode name Code Chart
a 0x0061 LATIN SMALL LETTER A Basic Latin
õ 0x00F5 LATIN SMALL LETTER O WITH TILDE Latin-1 Supplement
ő 0x0151 LATIN SMALL LETTER O WITH DOUBLE ACUTE Latin Extended-A
ű 0x0171 LATIN SMALL LETTER U WITH DOUBLE ACUTE Latin Extended-A
α 0x03B1 GREEK SMALL LETTER ALPHA Greek
д 0x0434 CYRILLIC SMALL LETTER DE Cyrillic

Character encoding
You probably are familiar with the following names: "ISO-8859-1", "ISO-8859-2", "ISO-8859-5", "ISO-8859-7". You can see these in the View > Encoding menu in your webbrowser.

These are used in the case when you know for sure, that a file (an html or a text file) will contain characters only from a single character set (for example you will use characters only form Central-European, Greek or Cyrillic set). In this case you can represent each character on a single byte (0x00-0xFF), and these characters are mapped to the corresponding unicode character using some conversion tables. The rendering of the characters is done using the unicode value. You can see these character mapping tables here.
Examples:
Char value Char Character Map Unicode value Unicode font Unicode image
0x61 a ISO-8859-1 0x0061 a
0x00F5 õ ISO-8859-2 0x0151 ő
0x00FB û ISO-8859-2 0x0171 ű
0x00E1 á ISO-8859-7 0x03B1 α
0x00D4 Ô ISO-8859-5 0x0434 д

UTF-8
UTF-8 is a multibyte encoding for Unicode, it's supported in many softwares, and it's used in Flash Mx too.

Why do we need an encoding for Unicode? It's easy to see that the 0x00-0xFF characters can be stored on a single byte, the 0x0100-0xFFFF on two bytes, and so on. The problem is that if you just simply put the characters together in this way (some characters on one byte, others on two), no one would know where are the character boundaries.
You could set that you will use four bytes for each unicode character, but in this case you'll waste a lot of space with zero bytes.

The solution is to use an encoding that will let you to detect the character boundaries, and will use only the space that is really needed. And that encoding seems to be the UTF-8.

Here is a table that presents the encoding method (you can find a detailed description here)

Character Range Bit Encoding
U+0000 - U+007F 0xxxxxxx
U+0080 - U+07FF 110xxxxx 10xxxxxx
U+0800 - U+FFFF 1110xxxx 10xxxxxx 10xxxxxx
U+10000 - U+10FFFF 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx

What does this mean exactly in the case of Flash MX?

Let's take an example: You have a Flash movie with a TextField. You want to load an external file which contains a single variable/value pair, and you want to display the value in the TextField.
The code will be the following:
var loader = new LoadVars();
loader.onLoad = function(res){
  if (res){
    _root.textFieldInstance.text = this.var1;
  }
  else{
    trace("error loading external file");
  }
}
loader.load("datafile.txt",loader);
The datafile.txt will look like this:
&var1=value1&
This probably will work. But what to do if you want to load a text containing some more interesting characters. Let's take the õ (0xF5 - LATIN SMALL LETTER O WITH TILDE) character. The file will look like this:
&var1=õextra&
The textfield probably will display a character (or a square if you don't have the corresponding character on your system), and the text "ra".

Let's find out why. If you take the hex value of the first character (0xF5) and convert it to binary, it will give [11110101].
If you inspect the UTF-8 table above, you will see, that if the bit pattern starts with "11110", it means that the character consists of four consecutive bytes. That's why the "õext" sequence is taken as a single character.

All we need to do is to transform the õ (0xF5) character into it's UTF-8 representation. This can be done either on paper, or using some functions. You can use the utf8_encode of PHP. Using this function the file will look like this:
&var1=õextra&
If you test now your movie, it should work

Of course, it's known that the data loaded into Flash should be urlencoded. This can be done using the rawurlencode of PHP. The data file will contain:
&var1=%C3%B5extra&

Conclusions
To safely transfer data from external sources into Flash, you should utf8-encode, than url-encode the string values. In PHP you can use:
urlencode(utf8_encode($stringValue));
// or
rawurlencode(utf8_encode($stringValue));

Of course this is only the half of the communication: from external sources to Flash. Please read the column in the right to find out how can the Cabron Connector help you to easily transfer multilanguage content in both directions.
Character Map Usage
If you want to take care yourself of the encoding/decoding of the strings while returning/receiving data from Flash, you should not modify anything in the source of the gateway or the services.
However, if you prefer, you can let the Cabron Gateway to take care of the caracter encodings.

Before reading this section, please read the left column, at least the section about the UTF-8.

Example
Let's suppose that you have a database with a table, that contains characters in the ISO-8859-2 (Central European) encoding. You want to present these texts in a flash movie.

In this case, you can force the gateway to map the characters from the database to the corresponding unicode values, and then encode them using UTF-8. This way you must not care about the mapping, encoding of the characters.

There are two ways to do this:
  • You can set the character mapper for the gateway. This means that all the services that will be run through this gateway will use the same character mapping.
  • You can set the character mapping for a given service, or for a given method of the service. This solution is the preferred one.
You should define a function or method in your service file/object named setGatewayOptions. This should return an associative array, that can have the following keys:
  • requestCharMapper - the character map list applied to the strings in the request
  • responseCharMapper - the character map list applied to the strings in the response
  • responseStripCR - cuts the "\r" characters from the strings in the response (both the "\r" and "\n" are taken as newlines by Flash, so if the data in the database is originated from a Windows system, an empty line will appear where you are expecting only a break)
  • fastSerialization - urlencodes only the needed characters (%,&), this way the size of the response will be smaller. Use only if you are sure it will work.
In our case you probably will set the responseCharMapper and responseStripCR values:
function setGatewayOptions($methodName){
  $opt = array();
  $opt['responseCharMapper'] = 'iso-8859-2-forced';
  $opt['responseStripCR'] = true;
  return $opt;
}

Let's suppose that you want to edit the texts in the Flash movie, and save it back to the database. If you use the above code, the data that is coming from Flash will be UTF-8 encoded, but you probably want to save it as "ISO-8859-2". You should specify a requestCharMapper in this case:
function setGatewayOptions($methodName){
  $opt = array();
  $opt['responseCharMapper'] = 'iso-8859-2-forced';
  $opt['requestCharMapper'] = 'iso-8859-2-forced';
  $opt['responseStripCR'] = true;
  return $opt;
}
Note, that all the characters that are coming fromFlash, and can not be mapped using the requestCharMapper will remain in UTF-8 encoding.


Character Map Files
The character maps are located in gateway/lib/char/fromutf8/ and gateway/lib/char/2utf/ directories.
The files are generated from the textfiles in gateway/lib/char/textmaps/ using the gateway/utils/charmapconverter/index.php utility.

The difference between the simple and "-forced" maps is that the simple maps convert just those characters, where the ASCII value and the Unicode value differs, while the "-forced" converts all the characters. This can be used to build a character map chain.
function setGatewayOptions($methodName){
  $opt = array();
  $opt['requestCharMapper'] = 'iso-8859-2,custommap,default';
  return $opt;
}
This construct can be useful if you want to decode everything that is above 0x7F with ISO-8859-2, convert some special characters that you expect that will appear in the text, and then convert anything remaining using the default mapper, which will only decode the characters that were not recognized by the other two mappers.

Custom Character Map Files
If you suspect that the user of the Flash movie will paste from Word the – (0x2013 - EN DASH) character, maybe you will want to conver it to - (0x02 - HYPHEN-MINUS) and then save it to the database.

In this case you should define your custom textmap file, with the following content:
0x02 0x2013 #
and then access the charmapconverter utility. This will generate both the 2utf and fromutf map files.
Default Character Map
If you expect that the users will use mainly English text, you should use the default map:
function setGatewayOptions($methodName){
  $opt = array();
  $opt['responseCharMapper'] = 'default';
  $opt['requestCharMapper'] = 'default';
  $opt['responseStripCR'] = true;
  return $opt;
}
http://cabron.sourceforge.net/ (a href="mailto:eatti@angelfire.com">eatti)