Collation Implementation Types
810
| 'a' = 'A' |
+-----------+
| 1 |
+-----------+
1 row in set (0.00 sec)
For implementation instructions, see
Section 10.4.3, “Adding a Simple Collation to an 8-Bit Character
Set”
.
Complex collations for 8-bit character sets
This kind of collation is implemented using functions in a C source file that define how to order
characters, as described in
Section 10.3, “Adding a Character Set”
.
Collations for non-Unicode multi-byte character sets
For this type of collation, 8-bit (single-byte) and multi-byte characters are handled differently. For 8-bit
characters, character codes map to weights in case-insensitive fashion. (For example, the single-byte
characters
'a'
and
'A'
both have a weight of
0x41
.) For multi-byte characters, there are two types of
relationship between character codes and weights:
• Weights equal character codes.
sjis_japanese_ci
is an example of this kind of collation. The
multi-byte character
'
ぢ
'
has a character code of
0x82C0
, and the weight is also
0x82C0
.
• Character codes map one-to-one to weights, but a code is not necessarily equal to the weight.
gbk_chinese_ci
is an example of this kind of collation. The multi-byte character
'
膰
'
has a
character code of
0x81B0
but a weight of
0xC286
.
For implementation instructions, see
Section 10.3, “Adding a Character Set”
.
Collations for Unicode multi-byte character sets
Some of these collations are based on the Unicode Collation Algorithm (UCA), others are not.
Non-UCA collations have a one-to-one mapping from character code to weight. In MySQL, such
collations are case insensitive and accent insensitive.
utf8_general_ci
is an example:
'a'
,
'A'
,
'À'
, and
'á'
each have different character codes but all have a weight of
0x0041
and compare as
equal.
mysql>
SET NAMES 'utf8' COLLATE 'utf8_general_ci';
Query OK, 0 rows affected (0.00 sec)
mysql>
SELECT 'a' = 'A', 'a' = 'À', 'a' = 'á';
+-----------+-----------+-----------+
| 'a' = 'A' | 'a' = 'À' | 'a' = 'á' |
+-----------+-----------+-----------+
| 1 | 1 | 1 |
+-----------+-----------+-----------+
1 row in set (0.06 sec)
UCA-based collations in MySQL have these properties:
• If a character has weights, each weight uses 2 bytes (16 bits)
• A character may have zero weights (or an empty weight). In this case, the character is ignorable.
Example: "U+0000 NULL" does not have a weight and is ignorable.
• A character may have one weight. Example:
'a'
has a weight of
0x0E33
.
• A character may have many weights. This is an expansion. Example: The German letter
'ß'
(SZ
ligature, or SHARP S) has a weight of
0x0FEA0FEA
.
• Many characters may have one weight. This is a contraction. Example:
'ch'
is a single letter in
Czech and has a weight of
0x0EE2
.
Summary of Contents for 5.0
Page 1: ...MySQL 5 0 Reference Manual ...
Page 18: ...xviii ...
Page 60: ...40 ...
Page 396: ...376 ...
Page 578: ...558 ...
Page 636: ...616 ...
Page 844: ...824 ...
Page 1234: ...1214 ...
Page 1427: ...MySQL Proxy Scripting 1407 ...
Page 1734: ...1714 ...
Page 1752: ...1732 ...
Page 1783: ...Configuring Connector ODBC 1763 ...
Page 1793: ...Connector ODBC Examples 1773 ...
Page 1839: ...Connector Net Installation 1819 2 You must choose the type of installation to perform ...
Page 2850: ...2830 ...
Page 2854: ...2834 ...
Page 2928: ...2908 ...
Page 3000: ...2980 ...
Page 3122: ...3102 ...
Page 3126: ...3106 ...
Page 3174: ...3154 ...
Page 3232: ...3212 ...