Double byte per Digit Strings


A DDS string is a UTF16LE encoded string. DDS objects (strDs) can be converted to/from strSs, using the CodePage stored in the object. Also, strD objects can be exported to/imported from other unicode formats, like UTF-7, UTF-8 and UTF16BE.


[1]
_d(anyValue {, CP} ) Double byte per digit string
  Returns <strD> / <err>
  anyValue Any Director value
  CP Integer CodePage.
     
  Description Double Digit String.
Creates a new double byte per digit string object, by converting the anyValue parameter to a DDS.
The conversion is code page dependant: anyValue is treated as an SDS with CodePage CP.

E.g.: If anyValue is a Director string containing characters belonging to the extended ASCII character set, or if anyValue is an MBCS string, the appropriate ANSI code page must be passed to this command, for the conversion to be accurate.
The codepage can be omitted if the string's CodePage is equal to the default code page.
In practice, if Director can properly display anyValue, and the Xtra's default code page has not been changed, the CP parameter can be omitted.

If the conversion fails, or CP is invalid, an <err> will be returned.
     
  Examples Creating an strD containing Greek characters

- On systems where Greek is the default language:
gr=_d("αβγ")

- On a system where Cyrillic is the default language:
gr=_d("бвг", 1253)

1253 is the value of the ANSI Greek CodePage. The SDS strings "αβγ" on a Greek system and "бвг" on a Cyrillic system are binary-wise identical. E.g. if saved as plain text from an application like notepad, the content of the files will be the same.


Create a string containing both Greek and Cyrillic characters. The result of all of the following commands should be the unicode string: αβγбвг

-- as displayed on a Greek system. The first '1253' CodePage is optional, since 1253 is the default codepage for Greek systems.
_d("αβγ", 1253).app("αβγ", 1251).pop()

--as displayed on a Cyrillic system. Here, the first CodePage definition is mandatory, since on Cyrillic systems, the code page for Cyrillic systems is 1251 - therefore, the codepage used in the app() command is optional
_d("бвг", 1253).app("бвг", 1251).pop()

--as displayed on western (ANSI 1252) systems. Both code pages are mandatory.
_d("áâã", 1253).app("áâã", 1251).pop()


The above examples demonstrate how an strD can be created in Director's message window, or script editor, mostly for test purposes.
To add true multilanguage support, the _u( ) command and an external unicode editor should be used.

To split an strD to strS objects that director can display, you can use the toSL() command:

d = _d("αβγ", 1253).app("αβγ", 1251)
d.toClip()       --<< try pasting the data to a word processor.
put d
-- αβγ???        --<< default code page for this system 1253. 1251 characters appear as '?'

parsedL = d.toSL(#prop)
put parsedL
-- [1253: αβγ, 1251: αβγ]   --<< after parsing, strings containing the original binary values are returned. By examining the CP (property), you can select the appropriate font to display each string (e.g. Arial-Greek for the first entry, Arial-CYR for the second)

put parsedL[1].pop(), parsedL[2].pop() --show the strings in a pop-up window
     
  Notes  

[2]
_dc(charCode {, CP} ) Double byte per digit Character
  Returns <strD> / <err>
  charCode Integer / Float / HexString character code.
  CP Integer CodePage.
     
  Description Double digit Character: Creates a new double byte per digit string object, containing a single character.

The character's value may range from 0 to 10FFFF (1,114,111), excepting the range D800-DFFF (UTF-16 Hi/Low surrogates)
A character with value less or equal to FFFF (65535) classifies as a UCS-2 character, and can be displayed by any unicode application.
A character with value above FFFF is a UTF16 only character, and can be displayed only by applications supporting UTF16 - USC-2 only apps will most probably display the character as two invalid characters.

If charCode is outside the ranges 0-D7FF and E000-10FFFF, or CP is invalid, an <err> will be returned.
     
  Examples put _dc(937)   -- 937 is the USC-2 code for the Greek capital letter Omega.
-- Ω
put _dc("3A9") --same as above, but charCode is a hex value.
-- Ω

d=_dc("3A9", 1251) -- Set Ansi-1251 (Cyrillic) as the string's CodePage
put d
-- ?
In the last example, the strD's binary value is the same as in previous examples. However, when a SDS is requested from the object (e.g. when requesting the object to return a Director string, like the 'put' command does), the strD object will attempt translate its unicode data to an SDS containing characters belonging to the 1251 codepage. Since the Greek letter 'Ω' does not exist in the Ansi-1251 CodePage's character map, a string containing '?' (or the default no-map character) is returned.
put d.cp(1253) --the Original DoubleDigit binary data are not affected:
-- Ω
 
     
  Notes When working with double digit strings only, there is no need to define a codepage.
No matter the system, the strD returned by _dc(937) will be the same. However, on all non-Greek systems, the result of the 'put' command will be a '?', denoting an unmappable character.
When the code page is omitted (as in the above examples) the default code page will be used as the string's CP upon creation - and therefore for all D to S conversions.
By using the 'put' command, the Xtra is instructed to convert the strD to a non-unicode Director string object. To do so, it uses the strD's code page to translate the DDS to an SDS:
It attempts to locate the character Ω's position in the codepage's character map. If the system is Greek, the character does exist (at position 217). For other systems, the code page does not exist, therefore, a '?' is returned.
If the correct code page is defined when creating a strD, the binary value returned to director will be the same on all systems. Though the string will be displayed correctly only on Greek systems' message windows, having the correct string (binary-wise) returned can be used for displaying the results in a text member using the appropriate font (e.g. Arial-Greek).

[3]
_dcs(charCode {, charCode, charCode} ) Double byte per digit Character Sequence
  Returns <strD> / <err>
  charCode Integer / Float / HexString character code.
  CP Integer CodePage.
     
  Description Double Digit Character Sequence: Creates a new double byte per digit string object, containing one or more characters.

Each character's value may range from 0 to 10FFFF (1,114,111), excepting the range D800-DFFF (UTF-16 Hi/Low surrogates)

The default CP will be used as the string's CP.
If charCode is out of the 0-65535 range, an <err> will be returned.
     
  Examples put _dcs(65, 66) --The unicode value 65 maps to the Latin capital letter 'A', and 66 to 'B'
-- AB            -- Note that both ANSI and unicode decimal values 0-127 refer to the same characters

--create a string containing a Latin (C), a Greek (Ω), and a Chinese character 欦:
put _dcs(67, 32, "3A9", 32, "6b26").pop() --note: 32 = space
-- C Ω ?
     
  Notes charCodes are CP independent binary values. They are written directly to the strD's internal buffer.