Re: Corrupted multibyte characters in command substitutions fixes may be worse than problem.
On Mon, Feb 7, 2022 at 7:45 AM Lawrence Velázquez wrote: > > On Mon, Feb 7, 2022, at 1:26 AM, Alex fxmbsw7 Ratchev wrote: > > well i saw now, printf a char of "\0" results in 0 bytes out to wc -c > > % /usr/bin/printf '\0' | wc -c >1 > > > > however my solution still stays > > you just use memory locations instead of c strings > > and those entries in memory are of course of known length, before setting > > and all is fine > > "Your" solution is decades old. Everyone knows how Pascal-style > strings work. This is not cutting-edge computer science. i dunno what pascal strings are, sorry > > of course this means to not use these fauly 'c strings', but a self > > coded solution > > As Greg already mentioned, such a system requires converting back > to C strings for system calls and other external APIs. It's not > insurmountable, but it's more involved than just swapping all your > char * to my_string or whatever hard work this way i see sorry, thanks. > > I repeat: > > >> It's so simple that you should have no problem converting the entire > >> bash codebase to Pascal-style strings yourself. We'll wait. > > > -- > vq
Re: Corrupted multibyte characters in command substitutions fixes may be worse than problem.
> In the case of bash with environment having LC_CTYPE: C.UTF-8 or > en_US.UTF-8 > read: > 0xC3 (len=1) i.e. Ã ('A' w/tilde in a legacy 8-bit latin-compatible > charset), > but invalid if bash processes the environment setting of en_US.UTF-8. > > Should bash process it as legacy input or invalid UTF8? > Either way, what should it return? a UTF-8 char > (hex 0xc30x83) transcoded from the latin value of A-tilde, or > keep the binary value the same (return 0x83), > should it return a warning message? If it does, should > it return NUL for the returned value because the input was erroneous? Assuming Latin-1 when nothing in the environment points to it seems questionable. It might just as well be a Cyrillic character in ISO-8859-5 or whatever. Email filters were mentioned. Emails may use charsets different from the current environment -- even several different ones within a mail (I've sent such mails myself). So if bash were to "fix" input depending on the environment, even writing a pass-through filter would require parsing the Content-Type headers and changing the environment accordingly (or else, use an 8-bit clean charset throughout). So I don't think bash should change the input (unintentionally as with the original bug or intentionally as discussed here) unless and until it needs to do charset-dependent operations
Re: Corrupted multibyte characters in command substitutions fixes may be worse than problem.
On 2022-02-07 at 11:55 +0100, Alex fxmbsw7 Ratchev wrote: > > > however my solution still stays > > > you just use memory locations instead of c strings > > > and those entries in memory are of course of known length, before > > > setting and all is fine > > > > "Your" solution is decades old. Everyone knows how Pascal-style > > strings work. This is not cutting-edge computer science. > > i dunno what pascal strings are, sorry Pascal strings refers to strings prefixed with their length: https://en.wikipedia.org/wiki/String_(computer_science)#Length-prefixed Basically, what you were proposing. And as Veláquez said, it's ingenuous propose a solution nobody else asked for, expecting others to spend the effort of actually implementing it (plus the critics of their result, such as a limitation on the string length, or of wasted memory for every pointer).
Re: Corrupted multibyte characters in command substitutions fixes may be worse than problem.
On Tue, Feb 8, 2022 at 12:09 AM Ángel wrote: > > On 2022-02-07 at 11:55 +0100, Alex fxmbsw7 Ratchev wrote: > > > > however my solution still stays > > > > you just use memory locations instead of c strings > > > > and those entries in memory are of course of known length, before > > > > setting and all is fine > > > > > > "Your" solution is decades old. Everyone knows how Pascal-style > > > strings work. This is not cutting-edge computer science. > > > > i dunno what pascal strings are, sorry > > Pascal strings refers to strings prefixed with their length: > https://en.wikipedia.org/wiki/String_(computer_science)#Length-prefixed > > Basically, what you were proposing. i see, thank you for good explaintion ( in your words not url ) > > > And as Veláquez said, it's ingenuous propose a solution nobody else > asked for, expecting others to spend the effort of actually > implementing it (plus the critics of their result, such as a limitation > on the string length, or of wasted memory for every pointer). well as im an outsuder i agree else, i can just say, rather keep the nulls you kept the \1'en and \xff :)) ( yeah not you, the c library language or whatever ) greets