Collation Issues
782
10.1.7.5. Collation of Expressions
In the great majority of statements, it is obvious what collation MySQL uses to resolve a comparison
operation. For example, in the following cases, it should be clear that the collation is the collation of
column
charset_name
:
SELECT x FROM T ORDER BY x;
SELECT x FROM T WHERE x = x;
SELECT DISTINCT x FROM T;
However, with multiple operands, there can be ambiguity. For example:
SELECT x FROM T WHERE x = 'Y';
Should the comparison use the collation of the column
x
, or of the string literal
'Y'
? Both
x
and
'Y'
have collations, so which collation takes precedence?
Standard SQL resolves such questions using what used to be called “coercibility” rules. MySQL
assigns coercibility values as follows:
• An explicit
COLLATE
clause has a coercibility of 0. (Not coercible at all.)
• The concatenation of two strings with different collations has a coercibility of 1.
• The collation of a column or a stored routine parameter or local variable has a coercibility of 2.
• A “system constant” (the string returned by functions such as
USER()
[964]
or
VERSION()
[964]
)
has a coercibility of 3.
• The collation of a literal has a coercibility of 4.
•
NULL
or an expression that is derived from
NULL
has a coercibility of 5.
The preceding coercibility values are current as of MySQL 5.0.3. In MySQL 5.0 prior to 5.0.3, there
is no system constant or
NULL
coercibility. Functions such as
USER()
[964]
have a coercibility of 2
rather than 3, and literals have a coercibility of 3 rather than 4.
MySQL uses coercibility values with the following rules to resolve ambiguities:
• Use the collation with the lowest coercibility value.
• If both sides have the same coercibility, then:
• If both sides are Unicode, or both sides are not Unicode, it is an error.
• If one of the sides has a Unicode character set, and another side has a non-Unicode character set,
the side with Unicode character set wins, and automatic character set conversion is applied to the
non-Unicode side. For example, the following statement does not return an error:
SELECT CONCAT(utf8_column, latin1_column) FROM t1;
It returns a result that has a character set of
utf8
and the same collation as
utf8_column
.
Values of
latin1_column
are automatically converted to
utf8
before concatenating.
• For an operation with operands from the same character set but that mix a
_bin
collation and
a
_ci
or
_cs
collation, the
_bin
collation is used. This is similar to how operations that mix
nonbinary and binary strings evaluate the operands as binary strings, except that it is for collations
rather than data types.
Although automatic conversion is not in the SQL standard, the SQL standard document does say that
every character set is (in terms of supported characters) a “subset” of Unicode. Because it is a well-
known principle that “what applies to a superset can apply to a subset,” we believe that a collation for
Unicode can apply for comparisons with non-Unicode strings.
Summary of Contents for 5.0
Page 1: ...MySQL 5 0 Reference Manual ...
Page 18: ...xviii ...
Page 60: ...40 ...
Page 396: ...376 ...
Page 578: ...558 ...
Page 636: ...616 ...
Page 844: ...824 ...
Page 1234: ...1214 ...
Page 1427: ...MySQL Proxy Scripting 1407 ...
Page 1734: ...1714 ...
Page 1752: ...1732 ...
Page 1783: ...Configuring Connector ODBC 1763 ...
Page 1793: ...Connector ODBC Examples 1773 ...
Page 1839: ...Connector Net Installation 1819 2 You must choose the type of installation to perform ...
Page 2850: ...2830 ...
Page 2854: ...2834 ...
Page 2928: ...2908 ...
Page 3000: ...2980 ...
Page 3122: ...3102 ...
Page 3126: ...3106 ...
Page 3174: ...3154 ...
Page 3232: ...3212 ...