// Xtrema Framework x32 : Documentation : String Class

String Conversions

Converting SDS to DDS and v.v, plus translate (map) strings.

[1]

strS.nmc( {#get} ) | xStr(#nmc)

'no map' character

Returns

#get

When used, the Xtra will return the current NMC

Description

Sets strS as the No Map Character (NMC).
If strS is an empty string, the NMC will be cleared, and the system's default character will be used.

To get the current NMC's value, use xStr(#nmc), or _s().nmc(#get)

The NMC will be used by the toS() command: any DD characters that cannot be mapped to an SD character of the target CP, will be replaced in the with the NMC. The character that will be used do display unmappable characters if NMC is not set, is decided by the system ( usually: '?' )

Examples

put _s().nmc(#get) --check the current nmc value.
-- <Void>
put _d("abc-αβγ", 1253).toS(1251)
-- abc-???          --unmappable characters are displayed using the default '?' character.
put _s("#").nmc()   --set the NMC to '#'
-- #
put _d("abc-abg", 1253).toS(1251)
-- abc-###          --now '#' is used for all unmappable characters.
put _s().nmc()      --reset the NMC. the '?' character will be used in subsequent toS() calls.
-- <Void>

Since a strD object may contain cached SD data from previous calls or automatic conversions, you should clear the strS's cache before using a new NMC.

--Create a strD, with CP=1253. strD will contain Cyrillic, well as Greek characters.
d=_d("[GR.αβγ]",1253).app("[CY.αβγ]", 1251)
put _s().app(d)      --appending the string to a Greek SDS. SDS data will be cached by d
-- [GR.αβγ][CY.???] --'?' was used for unmappable (Cyrillic) characters.
put _s("$").nmc()    --set the NMC to '$'
-- $
put d
-- [GR.αβγ][CY.$$$] --'$' used for unmappable characters.
put _s().app(d)      --unlike the put command above, .app() accesses cached data, if available.
-- [GR.αβγ][CY.???] --So, the previously cached SD string is appended.
put d.cacheSz(0)     --Clear any cached data...
-- -16
put _s().app(d)      --and try appending again.
-- [GR.αβγ][CY.$$$]

put _s("@").nmc()      --changing nmc again. 'd' now holds the new cached SD data.
-- @
put _s("",1251).app(d) --appending to a strS with different code page will not use the cached data.
-- [GR.@@@][CY.αβγ]

Notes

Though NMC accepts double digit characters, only the first digit will be used during conversion (Windows)
According to ms docs, using a custom NMC (or 'default character') will affect the performance of the toS() command.

[2]

strD.toS( {CP} {,#cc} {,flags} )

convert DDS to SDS

Returns

CodePage to be used for the conversion. If set, it overrides the CP stored in the strD object.

flags

Flags to be used.

Description

Creates a new strS, by converting the unicode strD to a SBPC or MBPC SDS object (strS).
If a CP is passed, it will be used instead of the strD's CP for the conversion, and CP will become the new object's CP.

The strD to strS conversions are performed using the Windows 'WideCharToMultiByte' function
[Related blog]

In most cases, you don't need to pass any flags to this command.
However, access to all flags the WCtoMB command is included in the Xtra, as it may be required for special cases. In the table below, the flags, well as their c++ equivalents have been included, along with some simplified as possible instructions and examples.
( the ms documents for these flags are rather confusing )

flags:

Xtra	WideCharToMultiByte
automatically added if any of the #comp(XXX) flags below is selected. Just one #compXXX flag can be used at a time.	WC_COMPOSITECHECK	Compose: Combine two or more characters in one, if possible. If two or more sequential characters of the source string can be combined to a single precomposed character, and if that single character exists in the destination CP, return the precomposed character instead of a sequence of characters. E.g.: letter followed by accent -> letter with accent. Selecting one of the following #compXXX flags, enables this method, and specifies how to handle exceptions.
#comp	WC_COMPOSITECHECK \| WC_SEPCHARS (0)	Compose. Any extra non-spacing characters will be returned as single characters. E.g.: letter, accentA, accentB -> letter with accentA , accentB
#compXdropNs	WC_COMPOSITECHECK \| WC_DISCARDNS	Compose when possible, discarding extra non spacing characters. E.g.: letter , accentA , accentB -> letter with accentA (accent B is ignored)
#compXnmc	WC_COMPOSITECHECK \| WC_DEFAULTCHAR	Try to compose. If a non spacing character cannot be added to the composition, return NMC for the entire sequence. E.g: letterA, accentA , accentB -> NMC letterA, accentA , accentAA -> letterA with accents A and AA
#nbToNmc	WC_NO_BEST_FIT_CHARS	w98, w2k +. Replace any Non-Biderectional characters with the NMC character. NB: a DD character that can be mapped to a SD character, but if the resulting SD is converted back to a DD the result would not be the original DD: a.toS().toD()<>a
#nmcErr	WC_ERR_INVALID_CHARS	If the source string contains unmappable characters, return <err>. Since the WC_ERR_INVALID_CHARS flag is Vista+ only, custom code has been used for this command to support all Windows versions.

Examples

put _d("abc-αβγ").toS()
-- abc-αβγ
put _d("abc-αβγ", 1251).toS()
-- abc-αβγ
put _d("abc-αβγ", 1251).toS(1253) -- the original data were created to Cyrillic unicode characters.
-- abc-???                        --the Greek codepage 1253 contains no Cyrillic characters.

d=_dcs("0075", "0308") --u ̈ (2 DD characters)
put d.tos(1252).pop().cList(#hex)
-- [75, A8]            --u¨(2 SD characters)
put d.tos(1252, #comp).pop().toClip().cList(#hex)
-- [FC]                --ü (1 SD character)

d=_dcs("0075", "0308", "0308").cp(1252)    --u ̈̈ (3 characters - overlapping)
put d.toS(#comp).pop().cList(#hex)       --ü¨ (1 composite + 1 non spacing character)
-- [FC, A8]
put d.toS(#compXdropNs).pop().cList(#hex) --ü (1 composite character)
-- [FC]
put d.toS(#compXnmc).pop().cList(#hex)    --? (1 unmappable character)
-- [3F]
put d.toS(#compXnmc, #nmcErr)               --same command as above, but with #nmcErr enabled.
-- <xErr 1113 No mapping for the Unicode character exists in the target multi-byte code page.>

put d.toS(#nbToNmc).pop().cList(#hex)     --u?? (1 character + 2 non-bidir -> nmc)
-- [75, 3F, 3F]
put d.toS(1252, #comp, #nbToNmc).pop().cList(#hex) -- ü? (1 composite + 1 non-bidirectional)
-- [FC, 3F]

Notes

If strD is actually a SDS, the result of the sequence strD.toD().toS( {cp} {,flags}) will be returned.
The .toS() command does not use any cached data stored in the source object.

[3]

strD.toSL( {#standard} {#method})

convert DDS to a list of SBCS SDSs

Returns

CodePage to be used for the conversion. If set, it overrides the CP stored in the strD object.

#standard

#win (default) or #mac. Specifies the type of SDS strings (CodePages) to return.

#method

#err: if the string contains unmappable characters, return an <err>
#ref: if the string contains unmappable characters, return a reference to the first unmappable character
#prop: return a propList of CP:strS pairs.
#propX: return a propList of CP:strS pairs. The property CP will be 0 for all 7bit characters, and -1 for unmappable characters.

Description

Attempts to convet strD to a sequence of SBPC strSs.
If a character (or sequence of characters) can't be converted to a strS (e.g. belonging to a DBPC code page), that part of the string will be returned as a strD.

If #prop, or #propX is used, the result will be a propList containing cp:strS pairs.
The cp property of the list, will be
-1: for parts of the original string that could not be mapped to SBPC code pages (and the value will be strD)
0: (#propX only) if the strS's characters exist in all SBPC code pages.
CP: if the strS's characters belong to a specific code page.

The CP values of the parsed str objects will always be valid code pages. Preferred order: strD's CP > glbCP > 1252(win) / 10000(mac)
To display the returned strings in a text or field member, you have to use for each string a font that matches its code page.

Examples

put _d("abcαβγ").app("abcαβγ", 1251).toSL()
-- [abcαβγabc, αβγ]
put _d("abcαβγ").app("abcαβγ", 1251).toSL(#prop)
-- [1253: abcαβγabc, 1251: αβγ]
put _d("abcαβγ").app("abcαβγ", 1251).toSL(#propX)
-- [0: abc, 1253: αβγ, 0: abc, 1251: αβγ]

put _d("abcαβγ").app("abcαβγ", 1251).toSL(#mac, #prop)
-- [10000: abc, 10006: αβηabc, 10007: αβγ]
put _d("abcαβγ").app("abcαβγ", 1251).toSL(#mac, #propX)
-- [0: abc, 10006: αβη, 0: abc, 10007: αβγ]

Notes

For the first example, you could use the Arial-Greek font to display the first str in the list, and Arial-Cyr for the second.
For methods other than #propX, the results of this command may vary for machines with different language settings, since language (defaultCP) affects the preferred order.

[4]

strS.toD( {CP} {,flags} )

convert SDS to DDS

Returns

Forces the Xtra to use CP instead of strS's CP for the conversion.

flags

Flags to be used.

Description

Creates a new strD, by converting the SBPC or MBPC strS to a unicode DDS object (strD).
If a CP is passed, it will be used instead of the strD's CP for the conversion, and CP will become the new object's CP.

The strS to strD conversion is performed using the Windows 'MultiByteToWideChar' function.
[Related blog]

flags:

Xtra	WideCharToMultiByte
	MB_PRECOMPOSED	Do not split precomposed characters (default)
#decomp	MB_COMPOSITE	If strS contains characters that can be decomposed to a sequence of characters, return the decomposed characters. E.g.: letter with accent -> letter , accent
#glyphsOnly	MB_USEGLYPHCHARS	related blog (msie: setting encoding to utf-8 may be required)
#nmcErr	MB_ERR_INVALID_CHARS	This should normally be #invErr, but, for simplification, the toS command's flag has been used. Check if an invalid character exists in the source string. An MBCS SDS's lead byte not followed by a legal byte is considered an invalid character. Note that invalid characters may be also contained in strings with a unicode CP, but such an strX object will never have such a CodePage - unicode conversions are handled by the .uToD() / .toU() commands.

Examples

s=_sc("c4", 1252) --Ä = [00C4] | A¨=[0041, 0308]
put s.toD().cList(#hex), s.toD(#decomp).cList(#hex)
-- [00C4] [0041, 0308] --the contents of the strings created with #comp and with #decomp

put _sc("82", 932 ).toD()
--
put _sc("82", 932).toD(#nmcErr)
-- <xErr 1113 No mapping for the Unicode character exists in the target multi-byte code page.>

Notes

If strS is actually a DDS, the result will be a copy of the original object { with CP=CP }

[5]

strX.map( flags )

map (foldString)

Returns

flags

Flags to be used.

Description

Returns a new string after processing the original strX according to flags.

The processing is performed by the Windows 'FoldString' function. [Related blog]

flags:

Xtra	WideCharToMultiByte
#comp	MAP_PRECOMPOSED	If strX contains sequence of characters that can be composed to a single precomposed character, return the precomposed character E.g.: letter , accent -> letter with accent
#decomp	MAP_COMPOSITE	If strX contains accented characters that can be decomposed to a sequence of characters, return the decomposed characters. E.g.: letter with accent -> letter , accent
#lgDecomp	MAP_EXPAND_LIGATURES	Decompose ligatures to character sequences E.g. æ -> ae
#stdDec	MAP_FOLDDIGITS	Map characters that represent decimal characters in languages that don't use Arabic numbers to their Arabic digits (0-9) unicode equivalents. E.g. map Indic digits ٠١٢٣٤٥٦٧٨٩ to 0123456789 blog
#cZone	MAP_FOLDCZONE	see FoldString, remarks section.

Examples

d=_dcs("0041", "0308")
put d, d.map(#comp), d.map(#comp).map(#decomp)
-- A¨ Ä A¨             --as seen on Western (1252) systems.

d=_dc("00e6")
put d, d.map(#lgDecomp)
-- æ ae                --as seen on Western (1252) systems.
indic=_dcs("0660", "0661", "0662")    --٠١٢ = 012
put indic.pop().cList(#hex), indic.map(#stdDec).pop()
-- [0660, 0661, 0662] 012 --indic numbers ٠١٢ mapped to 012

Filtering out accents, and capitalizing text e.g. for indexing or accent insensitive searches.
s=_s("Ελληνικό κείμενο")
put s.map(#decomp).sRep("´").upper
-- ΕΛΛΗΝΙΚΟ ΚΕΙΜΕΝΟ

or:
put s.toD().map(#decomp).toS().sRep("΄").upper
-- ΕΛΛΗΝΙΚΟ ΚΕΙΜΕΝΟ

Notes

This command can be called on strSs, long as they use the system's codepage, but may not work as expected for all CodePages.
It is suggested to use this instead: result = strS.toD().map(flags).toS()

[6]

strX.isS() / strX.isD()

check string type (DDS / SDS)

Returns

True / False

Description

isS() returns true if strX is a SDS, and false if it is a DDS.
isD() returns true if strX is a DDS, and false if it is a SDS.

Examples

put _s("abc").isS()
-- 1

Notes

[7]

strD.forceS()

treat strD as strS

Returns

Description

Forces the Xtra to treat strD as strS. The original object is returned (as strS) to the command line.

Examples

str=_d("ab")
put str.cList(#hex)
-- [0061, 0062]
put str.forceS().cList(#hex)
-- [61, 00, 62, 00]

Notes

You can use this command to turn a strD to a strS object, so that you can access/modify its binary content easier. The object can be converted back to strD using the strS.forceD() command.

[8]

strS.forceD()

treat strS as strD

Returns

Description

Forces the Xtra to treat strS as strD
If strS's length in bytes is not even, an <err> is returned.
Otherwise, the original object will be returned (as strD) to the command line.

Examples

str=_s("abcd")
put str.cList(#hex)
-- [61, 62, 63, 64]
put str.forceD().cList(hex)
-- [6261, 6463]

put _s("abc").forceD() --odd number of bytes.
-- <xErr 66623 InvalidData>

Notes