On Sat, Oct 15, 2022 at 11:42:17PM -0300, Lucas de Sena wrote:
> Hi,
>
> After trying to split a string into fields delimited with colons and
> spaces, I found this bug in how ksh(1) does substitution. The actual
> behavior contradicts what other shells like bash and mksh do and also
> contradicts its own manual.
>
> Running the following on other shells (say, bash) prints "/foo/bar/".
> This command splits the string " foo : bar " into two fields: "foo"
> and "bar", considering colon and space as delimiters.
>
> echo " foo : bar " | {
> IFS=": "
> read -r a b
> printf -- "/%s/%s/\n" "$a" "$b"
> }
>
> However, running the same command in OpenBSD ksh(1) (or sh(1)) splits
> the string into "foo" and ": bar".
This is because the last parameter (b) is a concatenation of two fields.
Parsing
is done properly if you add c to the read command:
+ echo foo : bar
+ IFS=:
+ read -r a b c
+ printf -- /%s/%s/%s/\n foo bar
/foo//bar/
>
> The manual ksh(1) provides the following, similar example:
>
> > Example: If IFS is set to “<space>:”, and VAR is set to
> > “<space>A<space>:<space><space>B::D”, the substitution for $VAR
> > results in four fields: ‘A’, ‘B’, ‘’ (an empty field), and ‘D’.
> > Note that if the IFS parameter is set to the NULL string, no field
> > splitting is done; if the parameter is unset, the default value of
> > space, tab, and newline is used.
>
> Let's try it:
>
> echo " A : B::D" | {
> IFS=" :"
> read -r arg1 arg2 arg3 arg4
> printf -- '1st: "%s"\n' "$arg1"
> printf -- '2nd: "%s"\n' "$arg2"
> printf -- '3rd: "%s"\n' "$arg3"
> printf -- '4th: "%s"\n' "$arg4"
> }
>
> bash(1) splits the line into the following fields:
>
> 1st: "A"
> 2nd: "B"
> 3rd: ""
> 4th: "D"
>
> This is actually the expected output, as described in the manual.
>
> However, running the same command in OpenBSD ksh, prints this:
>
> 1st: "A"
> 2nd: ""
> 3rd: "B"
> 4th: ":D"
>
> A completelly different thing.
> The same occurs with OpenBSD sh(1).
What you observe is the result of the next paragraph in the man page
after the example you quoted:
Also, note that the field splitting applies only to the immediate result
of the substitution. Using the previous example, the substitution for
$VAR:E results in the fields: `A', `B', `', and `D:E', not `A', `B', `',
`D', and `E'. This behavior is POSIX compliant, but incompatible with
some other shell implementations which do field splitting on the word
which contained the substitution or use IFS as a general whitespace
delimiter.
>
> I could not understand how OpenBSD does the spliting, but the way it
> does is clearly a bug: it does not only contradicts its own manual,
> but also differs from other implementations.
I do not see contradictions in the man page. It does say that behavior
is incompatible with other shells.
>
> Thank you,
> Lucas de Sena.
>