[1] |
_u(uStrS {,uFormat} {, CP} ) | uStrS.uToD({,uFormat} {, CP} ) |
Unicode Encoded SDS to DDS |
|
|
Returns |
<strD> / <err> |
|
uStrS |
String unicode data (strS, or Director string) |
|
uFormat |
Symbol unicode format of uStrS. Possible values are #u7, #u8, #u16, #u16b |
|
CP |
Integer CodePage to be set as the CP of the resulting <str> |
|
|
|
|
Description |
Attempts to decode a unicode encoded string to a strD.
uStrS can be holding e.g. the contents of a unicode text file.
If uStrS contains a BOM, the uFormat value is ignored.
If uFormat has been specified, and BOM does not exist in uStrS, the Xtra will try to 'uFormat ' decode (e.g. utf-7 decode, when uFormat= #u7) the data uStrS.
If BOM does not exists in uStrS, and no uFormat has been specified, the Xtra will try to utf-8 decode the data uStrS. |
|
|
|
|
Examples |
d=_s("B103B203B303310432043304").hexBlockToD() --convert a hex block to the strD 'αβγбвг'
d.pop() --just checking...
uStr=d.toU(1) --returns a strS holding utf-8 data (default) including BOM.
put uStr --display raw utf-8 data
-- ο»ΏΞ±Ξ²Ξ³Π±Π²Π³ --as displayed on a Greek system (display does not affect the actual data)
dd=uStr.uToD() --decode the unicode encoded data back to a strD.
put dd.pop()=d --check if the result and the original strings are equal (+ view the result)
The string "+A7EDsgOz-", far as utf-8 is concerned, is the literal string "+A7EDsgOz-". For utf-7, however, it's the BOM-less representation of the Greek characters "αβγ" :
put _u("+A7EDsgOz-")
-- +A7EDsgOz- --No BOM & encoding not utf-8. We have to tell the Xtra which decoding method to use.
put _u("+A7EDsgOz-", #u7) --or: put _s("+A7EDsgOz-").uToD(#u7)
-- αβγ --we now got the correct result
uStr = _s("abc-αβγ", 1253).toU(#u7, 1) --adding BOM when encoding...
put s
-- "+/v8-abc-+A7EDsgOz-"
put s.uToD() --...so, no need to tell the Xtra when decoder to use.
-- abc-αβγ
|
|
|
|
|
Notes |
If uStrS is actually a strD, the operation will be performed on its SDS equivalent (automatic internal conversion, CP dependant). |
[2] |
strD.toU({,uFormat} {, addBOM} ) |
DDS to Unicode Encoded SDS |
|
|
Returns |
<strS> / <err> |
|
uFormat |
Symbol unicode format of uStrS. Possible values are #u7, #u8 (default), #u16, #u16b |
|
addBOM |
Boolean. When true, the Xtra will include the uFormat's BOM in the string. Byte Order Mark is a 'stamp', prefixed to the string's binary data, that describes both the encoding and byte order of the data that follows. |
|
|
|
|
Description |
Attempts to unicode encode the content of strD, using the uFormat protocol. |
|
|
|
|
Examples |
put _d("abc-αβγ").toU() --utf-8 encode, no BOM
-- abc-Ξ±Ξ²Ξ³ --display, as shown on a Greek system
put _d("abc-αβγ").toU(1) -- utf-8 encode, with BOM
-- ο»Ώabc-Ξ±Ξ²Ξ³
put _d("abc-αβγ").toU(#u7) -- utf-7 encode, no BOM
-- abc-+A7EDsgOz- --system independent display: utf-7 uses ASCII characters only.
put _d("abc-αβγ").toU(#u7, 1) --utf-7 encode, addBOM
-- +/v8-abc-+A7EDsgOz-
|
|
|
|
|
Notes |
If strD is actually a strS, the operation will be performed on its DDS equivalent (automatic internal conversion, CP dependant).
#u16 is the format strDs use internally. When uFormat=#u16, the result string will contain a copy of the original strD's data, prefixed with the utf-16's byte order mark, if addBOM is true. |
[3] |
strS.uType() |
get a string's unicode type, by checking for a BOM |
|
|
Returns |
Symbol unicode format / 0 / <err> |
|
|
|
|
Description |
Checks if strS's data start with a known BOM. If so, a symbol specifying the string's unicode encoding type is returned. Otherwise, 0.
unicode types: #u7: utf-7, #u8: utf-8, #u16: utf-16, #u16b: utf-16 big endian.
|
|
|
|
|
Examples |
put _s("abcd").uType()
-- 0
put _d("abcd").toU(#u7).uType() --this will return 0, since no BOM was added while encoding.
-- 0
put _d("abcd").toU(#u7, 1).uType()
-- #u7 |
|
|
|
|
Notes |
If strS is actually a DDS, the Xtra will try to convert the data to SDS, and perform the operation on the SDS data. If the conversion fails, an <err> will be returned. |