File Encoding

A common requirement for applications which use Internet protocols is the need to encode binary files, as well as compress data to reduce the bandwidth and time required to send or receive the data. Encoding a binary file converts the contents of the file into printable characters which can be safely transferred over the Internet using protocols that only support a subset of 7-bit ASCII characters. This is commonly a restriction for e-mail, since many mail servers still are not capable of correctly processing messages which contain control characters, 8-bit data or multi-byte character sequences found in International text. To address this problem, the sender encodes and sends the data as part of a message; the recipient then extracts and decodes the data, with the end result being the same as the original, without any potential corruption by the mail servers which store and/or forward the message. The File Encoding library supports several encoding and decoding methods, including standard base64 encoding, quoted-printable encoding and uuencoding. For applications which access USENET newsgroup, the library also supports the newer yEnc encoding method which has become a popular method for attaching binary files to a message.

In addition to encoding and decoding files, the File Encoding library also can be used to compress files, reducing their overall size. Two compression algorithms are supported, the standard deflate algorithm which is commonly used in Zip files, and an algorithm based on the Burrows-Wheeler Transform (BWT) which can offer improved compression over the deflate algorithm for some types of files. The developer has control over the type of compression performed, as well as details such as the level of compression which determines how much memory and CPU time is allocated to compress the data. Developers can even create their own custom compression formats by creating an application-specific header block, typically represented by a structure or user-defined type that can be used to provide information to the program.

Unlike the other SocketTools libraries, there are no initialization functions for this library, and there are no handles used. All operations are performed either on files or on memory buffers provided by the application. The library is split into two general areas of functionality. The first group of functions enables you to encode and decode binary files and the second group enables you to compress and expand data.

Note that if you are interested in using this library for purposes of attaching files to an email message, it is not necessary that you use these functions. The Mail Message library has the ability to automatically encode and decode file attachments without requiring that you use the functions in this library. However, the File Encoding library is useful if you need the ability to encode and/or compress for other applications.

Encoding Types

There are several different encoding types available, with the default being the standard MIME encoding called Base64. The following encoding methods are supported by the library:

Base64

Base64 encoding works by representing three bytes of data as four printable characters. Each of the three bytes is converted into four six-bit numbers, and each six-bit number is converted to one of 64 printable characters (which is where the encoding method gets its name). Base64 is the default encoding method used by the library and is the standard encoding used for MIME formatted email messages as well as many other applications.

Quoted-Printable

Quoted-printable encoding is primarily used in email messages, and is best used when the data being encoded is text which consists primarily of printable characters. Only characters with the high-bit set or a certain subset of printable characters are actually encoded by representing them as their hexadecimal value. All other printable characters are passed through unmodified.

Uucode

One of the original encoding methods used for email, it gets its name from two UNIX command-line utilities called uuencode and uudecode, which were used to encode and decode files. Like Base64, uuencoding converts three bytes of data into four six-bit numbers, and then a value of 32 is added to ensure that it is printable. Uuencoding also adds some additional characters which are used to ensure the integrity of the encoded data. This encoding method is still used when posting files to USENET newsgroups, but has largely been replaced by Base64 when attaching files to email messages.

yEnc

yEnc is a relatively new encoding method that was created specifically for binary newsgroups on USENET. Because USENET doesn't have the same limitations as email systems in terms of what kind of characters can be safely used, yEnc only encodes null bytes and certain control characters; the remaining 8-bit data is passed through as is which can significantly reduce the overall size of the encoded data. yEnc also uses checksums to ensure the integrity of the data and is designed so that a large file can be split across multiple messages and then recreated.

Data Encoding

Encoding a binary file converts the contents of the file into printable characters which can be safely transferred over the Internet using protocols that only support a subset of 7-bit ASCII characters. This is commonly a restriction for e-mail, since many mail servers still are not capable of correctly processing messages which contain control characters, 8-bit data or multi-byte character sequences found in International text. To address this problem, the sender encodes and sends the data as part of a message; the recipient then extracts and decodes the data, with the end result being the same as the original, without any potential corruption by the mail servers which store and/or forward the message.

EncodeFile
This function encodes a file using the specified encoding method, storing the encoded data in a new file. An option also allows you to automatically compress the data prior to encoding it in order to reduce the overall size of the encoded file.

DecodeFile
This function decodes a previously encoded file using the specified encoding method, restoring the original contents. If the encoded data was compressed, this function can also be used to automatically expand the data after it has been decoded.

EncodeBuffer
This function encodes a block of data in memory using the specified encoding type. This is similar to the EncodeFile function, except that instead of using disk files, all of the encoding is done in memory. As with encoding a file, you can also specify that you want the data to be compressed prior to being encoded.

DecodeBuffer
This function decodes a previously encoded block of data. This is similar to the DecodeFile function, except that instead of using disk files, all of the decoding is done in memory. As with decoding a file, you can also specify that you want compressed data to be automatically expanded after it has been decoded.

Data Compression

In addition to encoding and decoding data, the library can be used to compress data in order to reduce its size. The compression functions may be used separately, or may be used as part of the process of encoding a file or a block of data.

CompressFile
This function reduces the size of a file using the standard Deflate algorithm. This is the same algorithm that is commonly used in Zip archives. Note however, that this does not create a Zip file, it simply uses the same compression method.

ExpandFile
This function restores the original contents of a file that was previously compressed using the CompressFile function. Note that this function is not designed to extract files from a Zip archive or expand data compressed using a different algorithm.

CompressBuffer
This function uses the Deflate algorithm to reduce the size of a block of data. This is similar to the CompressFile function except that it performs the compression on data in memory rather than in a disk file.

ExpandBuffer
This function restores the data that was previously compressed using the CompressBuffer algorithm.

There are some additional functions for compressing files that provide more advanced options such as the ability to specify the compression type and level, as well as enabling you to create your own custom file compression formats. Please refer to the Technical Reference for more information.


Copyright © 2008 Catalyst Development Corporation. All rights reserved.