
Chapter 20 – Reference Information
Multi-Tech Systems, Inc. CDMA Wireless AT Commands (PN S000294J)
156
Phonebook UCS2 Unicode
Text strings that contain UCS2 Unicode characters must be in one of the three supported record structures detailed in this
section. If the ME supports Unicode formatted text strings in the SIM, the ME will support all three record structures for
character sets that contain 128 or less characters. For Unicode character sets containing more than 128 characters, the ME
will at a minimum support the ‘80’ record structure. A record structure should not be used for non-Unicode character text
strings. Within a text string only one scheme, either non-Unicode or one of the three supported record structures described in
this section, shall be used.
In the following examples, an octet is 8 bits in length. The most significant bit is identified as bit 7 and the least significant bit is
identified as bit 0. When two octets are combined to form a sixteen bit word value, the most significant bit is identified as bit 15
and the least significant bit is identified as bit 0.
Unicode character sets:
http://www.unicode.org/charts/
Record Structure ‘80’:
This record structure is identified by a value of ‘80’ in the first octet of the text string. The remaining octets are interpreted as
sixteen bit UCS2 Unicode characters with the most significant octet (MSO) preceding the least significant octet (LSO) for each
UCS2 Unicode character in the string. An octet pair with a value of 'FFFF' is ignored.
Octet 1
Octet 2
Octet 3
Octet 4
Octet 5
Octet 6
Octet 7
Octet 8
Octet 9
'80'
Ch1MSO Ch1LSO Ch2MSO Ch2LSO Ch3MSO Ch3LSO 'FF'
'FF'
In the above example, the text string contains four UCS2 Unicode characters. The final character in octets 8 and 9 is ignored.
Record Structure ‘81’:
This record structure is identified by a value of ‘81’ in the first octet of the text string. The second octet of this structure
contains a value indicating the number of characters in the string. The third octet value is used to specify the Unicode
character set base pointer. This base pointer is used with some or all of the remaining octets in the text string.
The fourth and subsequent octets in the text string are interpreted as follows. If bit 7 of the octet is zero, then bits 6 through 0
define a standard non-Unicode character. If bit 7 of the octet is one, then bits 6 through 0 are combined with the base pointer
to define a UCS2 Unicode character.
Octet 1
Octet 2
Octet 3
Octet 4
Octet 5
Octet 6
Octet 7
Octet 8
Octet 9
'81'
'05'
'13'
'53'
'95'
'A6'
'8F'
'FF' '
FF'
In this example:
•
Octet 2 indicates that there are five characters in the text string. The base pointer (octet 3) is not included in this
count.
•
Octet 3 is used to define bits 14 through 7 of a base pointer. This octet is inserted into the binary bit pattern 0xxx xxxx
x000 0000 to become a sixteen bit value. In this example, ‘13’ specifies the first UCS2 character of the Bengali
character set which starts at code position 0980 (0000 1001 1000 0000).
•
Octet 4 contains a value with bit 7 equal to zero. Bits 6 through 0 (101 0011) of this octet correspond to the character
‘S’.
•
Octet 5 contains a value with bit 7 equal to one. Bits 6 through 0 (001 0101) of this octet are combined with the base
pointer value. The resulting sixteen bit value 0000 1001 1001 0101 ('0995') is the UCS2 Bengali letter ‘KA’.
•
Octet 8 contains the value 'FF' and since the string length is 5, this a valid character in the text string. Bit 7 of this
character equals one. Bits 6 through 0 (111 1111) of this octet are combined with the base pointer value. The
resulting sixteen bit value 0000 1001 1111 1111 (‘09FF’) is the last UCS2 Bengali character.
•
Octet 9 is ignored since it is beyond the octet 2 specified number of characters.
Record Structure ‘82’:
This record structure is identified by a value of ‘82’ in the first octet of the text string. The second octet of this structure
contains a value indicating the number of characters in the string. The third and fourth octets are used to specify the Unicode
character set base pointer. This base pointer is used with some or all of the remaining octets in the string.
The fifth and subsequent octets in the string are interpreted follows. If bit 7 of the octet is zero, then bits 6 through 0 define a
standard non-Unicode character. If bit 7 of the octet is one, then bits 6 through 0 are combined with base pointer to define a
UCS2 Unicode character.
Octet 1
Octet 2
Octet 3
Octet 4
Octet 5
Octet 6
Octet 7
Octet 8
Octet 9
'82'
'05' ‘
05’
MSO
‘30’
LSO
2D'
'82'
'D3'
'2D'
'31'
In this example:
•
Octet 2 indicates that there are 5 characters in the text string. The base pointer (octets 3 and 4) are not included in
this count.
•
Octets 3 and 4 specify a sixteen bit base pointer '0530' which is the first UCS2 character of the Armenian character
set.
•
Octet 5 contains a value with bit 7 equal to zero. Bits 6 through 0 (010 1101) of this octet correspond to the character
dash ‘ - ‘.