tl;dr new diff below

On 3/15/20 11:21 AM, 0xef967...@gmail.com wrote:
> On Sun, Mar 15, 2020 at 09:30:22AM +0100, Martijn van Duren wrote:
>>> --- lib/libc/stdlib/getopt_long.c~
>>> +++ lib/libc/stdlib/getopt_long.c
>>> @@ -418,15 +418,8 @@
>>>     }
>>>  
>>>     if ((optchar = (int)*place++) == (int)':' ||
>>> -       (optchar == (int)'-' && *place != '\0') ||
>>> -       (oli = strchr(options, optchar)) == NULL) {
>>> -           /*
>>> -            * If the user specified "-" and  '-' isn't listed in
>>> -            * options, return -1 (non-option) as per POSIX.
>>> -            * Otherwise, it is an unknown option character (or ':').
>>> -            */
>>> -           if (optchar == (int)'-' && *place == '\0')
>>> -                   return (-1);
>>> +       ((oli = strchr(options, optchar)) == NULL ||
>>> +        (optchar == (int)'-' && oli == options))) {
>>>             if (!*place)
>>>                     ++optind;
>>>             if (PRINT_ERROR)
>>>
>> From my research you're close, but not quite.
>> This particular case is for handling the actual "-" string.
> 
> Nope. That's already handled above:
> 
> start:
>       if (optreset || !*place) {              /* update scanning pointer */
>               ...
>               if (*(place = nargv[optind]) != '-' ||
>                   (place[1] == '\0' && strchr(options, '-') == NULL)) {
> 
> Notice that OpenBSD's getopt(3) explicitly allows to use "-" as an
> option character, and there may be a /separate/ bug related to that,
> but my patch doesn't change or affect it in any way.
> 
> But in the case I'm WRONG, could you prove your point with a test?

Sure.
> 
> OpenBSD's getopt_long.c is self-contained. Copy it into some
> other directory, apply my patch, and try with the simple getopt-test.c
> test-case program at the end of this message:
> 
I actually prefer using LD_PRELOAD. :-)

So lets see what we have in the current code.
I'll try to dumb it down/stretch it out as much as possible because
there's a lot of moving parts and I want to retain any shred of sanity I
have left.

1) initialization: Here we setup global and local variables like optind,
   optreset and flags. Note that on line 317 we increment the pointer on
   options. This means that the "oli == options" check in your code is
   not needed and even introduces a bug when someone would be stupid
   enough to create an optstring which starts with "--", where a "-"
   option stop to being recognized:
$ make -j2 && OPTS=--p /tmp/getopt-test -p -                              
{/tmp/getopt-test} <p> <-> |
$ make -j2 && LD_PRELOAD=./obj/libc.so.96.0 OPTS=--p /tmp/getopt-test -p -
getopt-test: unknown option -- -
{/tmp/getopt-test} <p> #U- |

2) place is EMSG by default on first enter, so we fall into block at
   line 323.
2a) Here run into the check where 'nargv[optind] == "-"', but
    where '-' is not part of optstring. So here we jump out of the loop.
    I agree with you that this case is handled there and I must admit I
    overlooked this in my initial reply. Note that this is also the
    location where place is set to nargv[optind] including the leading
    '-' required to continue in the code.
2b) The last check that's important is:
    "if (place[1] != '\0' && *++place == '-' && place[1] == '\0') {"
    So what does this do?
    1) If we have "-" we break the if-statement early and nothing
       happens.
    2) We increment the place pointer by one.
    3) We check if the new place pointer contains the string "-", which
       results in an early return from the function with optind
       incremented.
3) We skip over the first big check after that, since long_options is
   not set, because we're checking getopt(3). (line 404)
4) And now we come at the part that's being questioned here.
4a) We assign place[0] to optchar. Keep situation 2b in mind here.
4b) We increment place by 1.
4c) We check if optchar equals to ':', because ':' can't be a valid
    character as an optstring character.
4d) we check if optchar is '-' and the new incremented place is not
    '\0', because according to the manual a '-' is only valid as a
    standalone string.
4e) we check if the optchar is part of optstring, because of spec
    reasons.

So now lets drill down in section 4 a little further with your
original example:
opstring: "q"
argv: "-q-"
First time we enter getopt_internal we arrive at point 4 with:
place = "q-"
optchar = 'q'
it's not a ':', not a '-' and it's part of optstring, so we pass.

Second time we enter getopt_internal we arrive at point 4 with:
place: "-"
optchar: '-'
It's not a ':', it is a '-', but the next character is '\0', so
we don't match there either, but it's not in optstring. This means
that we're entering the block based on optstring.
Since we match based on optstring we should always return BADCH,
but because we just check optchar + *place we get kicked out early,
with a -1 value and an optind that doesn't make a whole lot of sence.

Now I've already shown where your diff goes wrong in point 1, but what
if we did what you said in your original mail and remove the
"oli == option" check.
scenario:
optstring: "q-"
argv: "-q-"
place: "-"
optchar: '-'
It's not a ':' and it is in optstring, so we skip the check.
But this is obviously wrong, because we're back at your original bug,
because the context of place is not checked. Even worse, now we can do
an argv of "-q-q", which is currently not possible.

But what can we deduce from the above?
- An argv of "-" can't reach the lower check if '-' is not in optstring.
- An argv containing an '-' can if we don't have "--" as an argument, or
  it's not in the second position when we use getopt_long.

So what do we want based on the above:
- If '-' is not in optstring the lower check must always return BADCH.
- If '-' is in optstring it must match argv as "-" or else return
  BADCH.

So my initial patch might not be as correct as I originally thought,
because it already assumed some conditions that might also not
always be correct, but here's a second attempt based on above
reasoning/blabbering.

On 3/15/20 11:21 AM, 0xef967...@gmail.com wrote:
> Notice that OpenBSD's getopt(3) explicitly allows to use "-" as an
> option character, and there may be a /separate/ bug related to that,
> but my patch doesn't change or affect it in any way.

This is in line with getopt(3) and getopt_long(3) expands on that
saying it's (at least) for su(1). So if we want to remove this
(mis)feature a lot more scrutiny needs to be done and I'm not feeling
like going down that rab(bit|ies)-hole any time soon.

On 3/15/20 12:32 PM, 0xef967...@gmail.com wrote:
> A "side-effect" of my patch is that it prevents the only case where
> OpenBSD's getopt(3) was clinging to the stale state after having
> finished parsing the arguments and returned -1.

I'm not a 100% sure, but on first glance it looks like this is
actually unspecified section of POSIX[0]:
When an element of argv[] contains multiple option characters, it is
unspecified how getopt() determines which options have already been
processed.
So from this it looks like we're actually not wrong per spec, but it
might be a bit unintuitive.

I did some pointer dumping inside your test-application and inside
getopt_internal and found that the OPTS environment variable is 1 byte
from the last place pointer. When you overwrite every byte of the
original avs with ':' (including the terminating NUL) you expand the
string length of the place variable inside getopt_internal and it starts
to parse the "garbage" you see in your example.

Since our manpage explicitly states that you need to set optreset I
don't know if it's worth adding a "fix" for, if such a thing is at
all possible. Maybe it could be done with adding a check for if place is
inside the range for nargv[optind], but it would be additional overhead
and it might not suffice. I'm pretty much on the fence for this one.

> That used to work in the old getopt(3) from *BSD, and in the (original?)
> SysV version, but was broken in the glibc version of getopt(3) and,
> in a corner case, in the getopt_long implementation from OpenBSD.

Your diff didn't fix the issue, but merely worked around this particular
case. In the current code the return -1 is triggered before optind can
be incremented.
If you were to reset mid-parsing it will still first continue on place.

martijn@

[0] https://pubs.opengroup.org/onlinepubs/9699919799/functions/getopt.html

Index: stdlib/getopt_long.c
===================================================================
RCS file: /cvs/src/lib/libc/stdlib/getopt_long.c,v
retrieving revision 1.30
diff -u -p -r1.30 getopt_long.c
--- stdlib/getopt_long.c        25 Jan 2019 00:19:25 -0000      1.30
+++ stdlib/getopt_long.c        15 Mar 2020 17:18:55 -0000
@@ -340,6 +340,10 @@ start:
                        nonopt_start = nonopt_end = -1;
                        return (-1);
                }
+               /*
+                * If we have an nargv of "-" only treat it as an option if it's
+                * in the options string.
+                */
                if (*(place = nargv[optind]) != '-' ||
                    (place[1] == '\0' && strchr(options, '-') == NULL)) {
                        place = EMSG;           /* found non-option */
@@ -417,16 +421,16 @@ start:
                }
        }
 
+       /*
+        * ':' can never be a valid option.
+        *
+        * Only treat '-' as a valid option if it's in the options string
+        * AND it's the entire nargv string
+        */
        if ((optchar = (int)*place++) == (int)':' ||
-           (optchar == (int)'-' && *place != '\0') ||
-           (oli = strchr(options, optchar)) == NULL) {
-               /*
-                * If the user specified "-" and  '-' isn't listed in
-                * options, return -1 (non-option) as per POSIX.
-                * Otherwise, it is an unknown option character (or ':').
-                */
-               if (optchar == (int)'-' && *place == '\0')
-                       return (-1);
+           (oli = strchr(options, optchar)) == NULL ||
+           (optchar == '-' &&
+           (oli == NULL || strcmp(nargv[optind], "-") != 0))) {
                if (!*place)
                        ++optind;
                if (PRINT_ERROR)
@@ -478,8 +482,6 @@ start:
 /*
  * getopt --
  *     Parse argc/argv argument vector.
- *
- * [eventually this will replace the BSD getopt]
  */
 int
 getopt(int nargc, char * const *nargv, const char *options)

Reply via email to