Re: Patch for unicode in varnames...

L A Walsh Tue, 06 Jun 2017 07:02:27 -0700


George wrote:

On Mon, 2017-06-05 at 16:16 -0700, L A Walsh wrote:
George wrote:
On Mon, 2017-06-05 at 15:59 +0700, Peter & Kelly Passchier wrote:
On 05/06/2560 15:52, George wrote:
there's not a reliable mechanism in place to run a script in alocale whose character encoding doesn't match that of the script
From my experience running such scripts is no problem, but correctrendering it might depend on the client/editor.
It depends on the source and target encodings. For most pairs ofsource and target encoding there is some case where reinterpreting astring from the source encoding as a string in the target encoding(without proper conversion) will result in an invalid string in thetarget encoding. For instance, if a script were written inISO-8859-1, many possible sequences involving accented characterswould actually be invalid in UTF-8.
---
    Um... I think you are answering a case that is different than
what is stated (i.e. locale being same as used in script).  So no
conversion should take place.
Eduardo's patch ... can only work correctly if the character setconfigured in the locale is the same as the character set of the script.

----

Right. The 1st paragraph (written by you), above, mentions that.Given the 1st paragraph (which no one is contesting), we are only

talking about the case where the run locale and script locale are the same.

 The Passciers wrote that regardless of such agreement, you can still find
editors that may be ignoring the locale r have no locale support at all
and only display characters where the editor was written.  While that is
also true, it can't really be helped: if your local editor only writes
in Chinese and the script is written in ASCII, you may be out of luck
in having it display properly.

Broadly speaking I think the approach taken in Eduardo's patch(interpreting the byte sequence according to the rules of itscharacter encoding) is better than the approach taken in currentversions of Bash (letting 0x80-0xFF slide through the parser) - butthat approach only works if you know the correct character encoding touse when processing the script. The information has to be provided inthe script somehow.

---

Not exactly -- as the only variable-length-encoding scheme thatlinux systems have had to worry about is UTF-8. So if you encounterUTF-8 in theinput, it is probable that you can use UTF-8 for the whole script.Otherwise, use a binary decoding stream (letting 0x80-0FF) be

treated as a 2nd half of a 256-byte charset, OR a 128-byte charset
without the a parity bit stripped that is left "as-is".

The utility "file" is one example of such a utility that can usuallytell

the encoding type of a text file -- at least telling the difference between
UTF-8, ASCII and some 8-bit local charset.

While such methods may not be 100% accurate, they are usually goodenough

for most usages where one isn't running (we hope) random scripts of unknown
origin off the web.

   FWIW, I think we are in agreement, though it may not be clear!  ;-)

Re: Patch for unicode in varnames...

Reply via email to