
About character encodings
375
Computers often must convert between character encodings. In particular, the character
encodings most commonly used on the Internet are not used by Java or Windows. Character sets
used on the Internet are typically single-byte or multiple-byte (including DBCS character sets
that allow single-byte characters). These character sets are most efficient for transmitting data,
because each character takes up the minimum necessary number of bytes. Currently, Latin
characters are most frequently used on the web, and most character encodings used on the web
represent those characters in a single byte.
Computers, however, process data most efficiently if each character occupies the same number of
bytes. Therefore, Windows and Java both use double-byte encoding for internal processing.
The Java Unicode character encoding
ColdFusion MX uses the Java Unicode Standard for representing character data internally. This
standard corresponds to UCS-2 encoding of the Unicode character set. The Unicode character set
can represent many languages, including all major European and Asian character sets. Therefore,
ColdFusion MX can receive, store, process, and present text from all languages supported by
Unicode.
The Java Virtual Machine (JVM) that is used to processes ColdFusion pages converts between the
character encoding used on a ColdFusion page or other source of information to UCS-2. The
page or data encodings that ColdFusion supports depend on the specific JVM, but include most
encodings used on the web. Similarly, the JVM converts between its internal UCS-2
representation and the character encoding used to send the response to the client.
By default, ColdFusion MX uses UTF-8 to represent text data sent to a browser. UTF-8
represents the Unicode character set using a variable-length encoding. ASCII characters are sent
using a single byte. Most European and Middle Eastern characters are sent as two bytes, and
Japanese, Korean, and Chinese characters are sent as three bytes. One advantage of UTF-8 is that
it sends ASCII character set data in a form that can be recognized by systems designed to process
only single-byte ASCII characters, while it is flexible enough to handle multiple-byte character
representations.
While the default format of text data returned by ColdFusion is UTF-8, you can have
ColdFusion return a page to any character set supported by Java. For example, you can return text
using the Japanese language Shift-JIS character set. Similarly, ColdFusion can handle data that is
in many different character sets. For more information, see
“Determining the page encoding of
server output” on page 379
.
Character encoding conversion issues
Because different character encodings support different character sets, you can encounter errors if
your application gets text in one encoding and presents it in another encoding. For example, the
Windows Latin-1 character encoding, Windows-1252, includes characters with hexadecimal
representations in the range 80-9F, while ISO 8859-1 does not include characters in that range.
As a result, under the following circumstances, characters in the range 80-9F, such as the euro
symbol (
€
), are not displayed properly:
•
A file encoded in Windows-1252 includes characters in the range 80-9F.
•
ColdFusion reads the file, specifying the Windows-1252 encoding in the
cffile
tag.
•
ColdFusion displays the file contents, specifying ISO-8859 in the
cfcontent
tag.
Summary of Contents for COLDFUSION MX 61-DEVELOPING COLDFUSION MX
Page 1: ...Developing ColdFusion MX Applications...
Page 22: ...22 Contents...
Page 38: ......
Page 52: ...52 Chapter 2 Elements of CFML...
Page 162: ......
Page 218: ...218 Chapter 10 Writing and Calling User Defined Functions...
Page 250: ...250 Chapter 11 Building and Using ColdFusion Components...
Page 264: ...264 Chapter 12 Building Custom CFXAPI Tags...
Page 266: ......
Page 314: ...314 Chapter 14 Handling Errors...
Page 344: ...344 Chapter 15 Using Persistent Data and Locking...
Page 349: ...About user security 349...
Page 357: ...Security scenarios 357...
Page 370: ...370 Chapter 16 Securing Applications...
Page 388: ...388 Chapter 17 Developing Globalized Applications...
Page 408: ...408 Chapter 18 Debugging and Troubleshooting Applications...
Page 410: ......
Page 426: ...426 Chapter 19 Introduction to Databases and SQL...
Page 476: ...476 Chapter 22 Using Query of Queries...
Page 534: ...534 Chapter 24 Building a Search Interface...
Page 556: ...556 Chapter 25 Using Verity Search Expressions...
Page 558: ......
Page 582: ...582 Chapter 26 Retrieving and Formatting Data...
Page 668: ......
Page 734: ...734 Chapter 32 Using Web Services...
Page 760: ...760 Chapter 33 Integrating J2EE and Java Elements in CFML Applications...
Page 786: ...786 Chapter 34 Integrating COM and CORBA Objects in CFML Applications...
Page 788: ......