| Unicode | ||
|
Unicode is a multi-language character set designed to encompass virtually all of the characters used with computers today. Unicode characters are represented by a 16-bit value, and differ from other character sets in two important ways. First, unlike the traditional single-byte (ANSI) character sets, Unicode is capable of representing significantly more characters in a variety of languages. Second, unlike multi-byte character sets (where some characters may be one byte in length, while others may be two bytes), the characters are fixed-width, which makes them easier to work with. Windows support for Unicode varies according to the platform. It is fully supported under Windows NT and later versions. However, Unicode support is limited under Windows 95, Windows 98 and Windows Me. The SocketTools libraries support both the ANSI and Unicode character sets under Windows Vista, Windows XP and Windows 2000. This is done by having two versions of each function that either expects a string as an argument (including those functions which pass structures that contain strings) or returns the address of a string. The version of the function that expects a single-byte character set has a suffix of "A" (ANSI), while the function which expects the Unicode character set has a suffix of "W" (wide). Note that no suffix is used with those functions which only expect numeric parameters and return numeric values. For example, consider the InetGetLocalName function mentioned in the previous section. If you looked at the list of exported functions in the library, you would see two functions exported, InetGetLocalNameA and InetGetLocalNameW. In C and C++, which function is called actually depends on how the application is being built. That is, if your application is built to use Unicode (in other words, the UNICODE macro is defined and you are linking with Unicode versions of the standard libraries), then the InetGetLocalNameW function will be used instead of the InetGetLocalNameA. In other languages, you may have to explicitly declare which version of the function you wish to use. In Visual Basic, for example, the Alias keyword must be used with the function declaration to specify the correct name. When you are developing software for the various Windows platforms, whether or not to use Unicode is an important consideration. If you are developing an application specifically for Windows 2000 or a later version of the operating system, Unicode support is an option. However, if your application must run on older versions of Windows, then it is recommended that you use the ANSI character set. Another alternative is to build two versions of the software, one that uses Unicode and another that uses ANSI. One final consideration when using Unicode is that, regardless of the character set, some of the SocketTools functions use byte arrays, not character strings. This can create problems when reading and writing Unicode string data. For example, consider the InetRead and InetWrite functions which are used to read and write data on a socket. Because character strings and byte arrays are essentially identical when using the ANSI character set, a C/C++ programmer may try to write code such as this:
This would work as expected until the programmer tries to compile for the Unicode character set. The problem is that the Unicode string is no longer an array of bytes, but is now an array of 16-bit integers. The string must be converted from Unicode to a byte array before passing it to the InetWrite function. To do this, the WideCharToMultiByte function can be used as follows:
Note that the type of characters being converted may also present a problem to the developer. In this example, the string is easily converted because it is composed only of characters that are part of the basic ASCII character set. However, when converting a string that contains International characters, such as accented vowels, the conversion may result in unprintable characters. For additional information, check your programming language's technical reference for issues with regards to localization and the use of Unicode. |
||
|
Copyright © 2008 Catalyst Development Corporation. All rights reserved. |
||