2022年11月17日(木) 6:47 Chet Ramey <chet.ra...@case.edu>:
> fnmatch with FNM_PATHNAME only has to avoid matching the slash with a
> bracket expression. The shell has an additional constraint: a slash that
> appears in a bracket expression renders the bracket expression void and
> requires the `[' to be matched explicitly. That's why there have to be
> tests for slash in BRACKMATCH. There are two bugs: the off-by-one error
> you note and matching the open bracket explicitly.

Thank you for the explanation.  I did not know the special rule for
the slash in the bracket expression.  I assumed that that part of the
code in BRACKMATCH is related to the behavior of the bracket
expression never matching a slash so just removed it, which was wrong.

> I attached the patch I applied. I didn't include your fix to issue 1 above.

Thank you for the patch.  I applied it locally and tried it.  I attach
a test script that I used: [bracket-slash.sh].  Now I switched to a
loadable builtin that directly calls strmatch instead of hacking
GLOBIGNORE.  Here are the results:

  ---Tests for a slash in bracket expressions---
  #1: pat=ab[/]ef          str=ab[/]ef          yes/yes
  #2: pat=ab[/]ef          str=ab/ef            no/no
  #3: pat=ab[c/d]ef        str=ab[c/d]ef        yes/yes
  #4: pat=ab[c/d]ef        str=abcef            yes/no
  #5: pat=ab[.-/]ef        str=ab[.-/]ef        no/yes
  #6: pat=ab[.-/]ef        str=ab.ef            yes/no
  #7: pat=ab[[=/=]]ef      str=ab[[=/=]]ef      yes/yes
  #8: pat=ab[[=/=]]ef      str=ab/ef            no/no
  #9: pat=ab[[=c=]/]ef     str=ab[=/]ef         yes/yes
  #10: pat=ab[[=c=]/]ef     str=abcef            yes/no
  #11: pat=ab[[:alpha:]/]ef str=ab[:/]ef         yes/yes
  #12: pat=ab[[:alpha:]/]ef str=abxef            yes/no
  #13: pat=ab[/[abc]]ef     str=ab[/c]ef         yes/yes
  #14: pat=ab[/[abc]]ef     str=abc]ef           no/no
  #15: pat=ab[c[=/=]]ef     str=ab[c[=/=]]ef     yes/yes
  #16: pat=ab[c[=/=]]ef     str=abc[=/=]ef       no/no
  #17: pat=ab[c[=/=]]ef     str=abcef            yes/no
  ---Tests for incomplete bracket expressions---
  #18: pat=ab[c             str=ab[c             yes/yes
  #19: pat=ab[c             str=abc              no/no
  #20: pat=ab[c[=d=         str=ab[c[=d=         yes/yes
  #21: pat=ab[c[=d=         str=abc              no/no
  #22: pat=ab[c[.d          str=ab[c[.d          yes/yes
  #23: pat=ab[c[.d          str=abc              no/no
  #24: pat=ab[c[:alpha:     str=ab[c[:alpha:     yes/yes
  #25: pat=ab[c[:alpha:     str=abc              no/no
  #26: pat=ab[c-            str=ab[c-            no/yes
  #27: pat=ab[c-            str=abc              no/no
  #28: pat=ab[c\            str=ab[c\            no/yes
  #29: pat=ab[c\            str=abc              no/no
  #30: pat=ab[[\            str=ab[[\            no/yes
  #31: pat=ab[[\            str=ab[              no/no

"yes" and "no" on the left of / in the fourth column are the results
after applying the patch you provided. "yes" and "no" on the right of
/ are the results that *I* expect (see below paragraphs).

The new treatment seems to only handle a slash that directly appears
as an independent character in bracket expressions (cases
#1,#2,#13,#14), but if I literally read the standard you quoted, I
feel we should also handle other slashes (#3..#6,#9..#12,#15..#17).
Cases #7 and #8 seem to be already processed in the current devel.  I
think we can understand the rule of the standard by a mental
processing model to first split the pattern with `/' and then to match
each segment with the corresponding segment of the target string.
Thinking in that way, we can actually just replace the NUL test for
the string end with a similar test for NUL or a slash.  I attach a
possible patch for the additional fix
[r0037.brackmatch4.slash-terminated.patch.txt], which applies after
your patch [bracket-slash.patch].

Also, I have a suggestion of changes for more consistent handling of
incomplete bracket expressions.  Currently, incomplete bracket
expressions sometimes fail unconditionally (#26..#31) and sometimes
fall back to a literal `[' plus remaining pattern (#18..#25).  I
attach another patch for this:
[r0037.brackmatch5.incomplete-bracket.patch.txt].

--
Koichi
From cbee6f5a782e9ce1b431704f33a3e03de85c4460 Mon Sep 17 00:00:00 2001
From: Koichi Murase <myoga.mur...@gmail.com>
Date: Thu, 17 Nov 2022 18:44:19 +0900
Subject: [PATCH 1/3] fix(BRACKMATCH): terminate bracket expression by any
 slash

---
 lib/glob/sm_loop.c | 45 ++++++++++++++++++++++++---------------------
 1 file changed, 24 insertions(+), 21 deletions(-)

diff --git a/lib/glob/sm_loop.c b/lib/glob/sm_loop.c
index a448c155..f4d8d1f2 100644
--- a/lib/glob/sm_loop.c
+++ b/lib/glob/sm_loop.c
@@ -439,6 +439,16 @@ BRACKMATCH (p, test, flags)
   if (not = (*p == L('!') || *p == L('^')))
     ++p;
 
+  /* POSIX.2 2.13.3 says: `If a <slash> character is found following an
+     unescaped <left-square-bracket> character before a corresponding
+     <right-square-bracket> is found, the open bracket shall be treated as an
+     ordinary character.' If we find a slash in a bracket expression and the
+     flags indicate we're supposed to be treating the string like a pathname,
+     we have to treat the `[' as just a character to be matched. In this
+     implementation, we treat `/' the same as the end of the pattern for the
+     bracket expression. */
+#define ISBRACKETEOF(c) ((c) == L('\0') || (flags & FNM_PATHNAME) && (c) == 
L('/'))
+
   c = *p++;
   for (;;)
     {
@@ -472,7 +482,7 @@ BRACKMATCH (p, test, flags)
          else
            {
              c = *p++;
-             if (c == L('\0'))
+             if (ISBRACKETEOF (c))
                return ((test == L('[')) ? savep : (CHAR *)0); /*]*/
              c = FOLD (c);
              continue;
@@ -486,11 +496,11 @@ BRACKMATCH (p, test, flags)
 
          pc = 0;       /* make sure invalid char classes don't match. */
          /* Find end of character class name */
-         for (close = p + 1; *close != '\0'; close++)
+         for (close = p + 1; !ISBRACKETEOF (*close); close++)
            if (*close == L(':') && *(close+1) == L(']'))
              break;
 
-         if (*close != L('\0'))
+         if (!ISBRACKETEOF (*close))
            {
              ccname = (CHAR *)malloc ((close - p) * sizeof (CHAR));
              if (ccname == 0)
@@ -535,7 +545,7 @@ BRACKMATCH (p, test, flags)
              /* continue the loop here, since this expression can't be
                 the first part of a range expression. */
              c = *p++;
-             if (c == L('\0'))
+             if (ISBRACKETEOF (c))
                return ((test == L('[')) ? savep : (CHAR *)0);
              else if (c == L(']'))
                break;
@@ -560,7 +570,7 @@ BRACKMATCH (p, test, flags)
 
       if (!(flags & FNM_NOESCAPE) && c == L('\\'))
        {
-         if (*p == '\0')
+         if (ISBRACKETEOF (*p))
            return (CHAR *)0;
          cstart = cend = *p++;
        }
@@ -573,23 +583,13 @@ BRACKMATCH (p, test, flags)
         expression produces undefined results.'  This implementation
         treats the `[' as just a character to be matched if there is
         not a closing `]'. */
-      if (c == L('\0'))
-       return ((test == L('[')) ? savep : (CHAR *)0);
-
-      /* POSIX.2 2.13.3 says: `If a <slash> character is found following an
-         unescaped <left-square-bracket> character before a corresponding
-         <right-square-bracket> is found, the open bracket shall be treated
-         as an ordinary character.' If we find a slash in a bracket
-         expression and the flags indicate we're supposed to be treating the
-         string like a pathname, we have to treat the `[' as just a character
-         to be matched. */
-      if ((flags & FNM_PATHNAME) && c == L('/'))
+      if (ISBRACKETEOF (c))
        return ((test == L('[')) ? savep : (CHAR *)0);
 
       c = *p++;
       c = FOLD (c);
 
-      if (c == L('\0'))
+      if (ISBRACKETEOF (c))
        return ((test == L('[')) ? savep : (CHAR *)0);
 
       /* This introduces a range, unless the `-' is the last
@@ -600,7 +600,7 @@ BRACKMATCH (p, test, flags)
          cend = *p++;
          if (!(flags & FNM_NOESCAPE) && cend == L('\\'))
            cend = *p++;
-         if (cend == L('\0'))
+         if (ISBRACKETEOF (cend))
            return (CHAR *)0;
          if (cend == L('[') && *p == L('.'))
            {
@@ -651,7 +651,7 @@ matched:
       int oc;
 
       /* A `[' without a matching `]' is just another character to match. */
-      if (c == L('\0'))
+      if (ISBRACKETEOF (c))
        return ((test == L('[')) ? savep : (CHAR *)0);
 
       oc = c;
@@ -660,7 +660,8 @@ matched:
        {
          brcnt++;
          brchrp = p++;         /* skip over the char after the left bracket */
-         if ((c = *p) == L('\0'))
+         c = *p;
+         if (ISBRACKETEOF (c))
            return ((test == L('[')) ? savep : (CHAR *)0);
          /* If *brchrp == ':' we should check that the rest of the characters
             form a valid character class name. We don't do that yet, but we
@@ -681,13 +682,15 @@ matched:
        brcnt = 0;
       else if (!(flags & FNM_NOESCAPE) && c == L('\\'))
        {
-         if (*p == '\0')
+         if (ISBRACKETEOF (*p))
            return (CHAR *)0;
          /* XXX 1003.2d11 is unclear if this is right. */
          ++p;
        }
     }
   return (not ? (CHAR *)0 : p);
+
+#undef ISBRACKETEOF
 }
 
 #if defined (EXTENDED_GLOB)
-- 
2.37.2

#!/usr/bin/env bash

LC_COLLATE=C

gcc -O2 -xc -o ./fnmatch - <<-EOF
	#include <fnmatch.h>
	#include <stdlib.h>
	#include <stdio.h>

	int main(int argc, char **argv) {
	  if (2 >= argc) {
	    fprintf(stderr, "usage: fnmatch string pattern\n");
	    exit(2);
	  }

	  int flags = FNM_PATHNAME | FNM_PERIOD | FNM_EXTMATCH;
	  if (fnmatch(argv[2], argv[1], flags) == 0)
	    return 0;
	  return 1;
	}
EOF

gcc -O2 -shared -xc -o ./strmatch.so - <<-EOF
	#define BUILTIN_ENABLED 0x01
	struct word_desc { char* word; int flags; };
	struct word_list { struct word_list* next; struct word_desc* word; };
	struct builtin {
	  const char* name;
	  int (*function)(struct word_list*);
	  int flags;
	  const char** long_doc;
	  const char* short_doc;
	  char* handle;
	};

	/*#include <glob/strmatch.h>*/
	int strmatch(char *pattern, char *string, int flags);
	#define FNM_PATHNAME    (1 << 0)
	#define FNM_NOESCAPE    (1 << 1)
	#define FNM_PERIOD      (1 << 2)
	#define FNM_LEADING_DIR (1 << 3)
	#define FNM_CASEFOLD    (1 << 4)
	#define FNM_EXTMATCH    (1 << 5)
	#define FNM_FIRSTCHAR   (1 << 6)
	#define FNM_DOTDOT      (1 << 7)

	static int strmatch_builtin(struct word_list* list) {
	  char *str, *pat;
	  if (!list || !list->word) return 2;
	  str = list->word->word;
	  if (!list->next || !list->next->word) return 2;
	  pat = list->next->word->word;

	  if (strmatch (pat, str, FNM_PATHNAME | FNM_PERIOD | FNM_EXTMATCH) == 0)
	    return 0;
	  return 1;
	}
	static const char* strmatch_doc[] = { "This is a builtin to test the behavior of strmatch", 0 };
	struct builtin strmatch_struct = { "strmatch", strmatch_builtin, BUILTIN_ENABLED, strmatch_doc, "strmatch string pattern", 0, };
EOF

enable -f ./strmatch.so strmatch

check_count=1
yes=$'\e[32myes\e[m'
no=$'\e[31mno\e[m'

function check {
  # bash impl
  if strmatch "$2" "$1"; then
    local strmatch=$yes
  else
    local strmatch=$no
  fi

  # fnmatch
  local expect=${3-}
  if [[ ! $expect ]]; then
    if ./fnmatch "$2" "$1"; then
      expect=$yes
    else
      expect=$no
    fi
  fi
  printf '#%d: pat=%-16s str=%-16s %s/%s\n' "$((check_count++))" "$1" "$2" "$strmatch" "$expect"
}

echo '---Tests for a slash in bracket expressions---'
check 'ab[/]ef'          'ab[/]ef'     "$yes"
check 'ab[/]ef'          'ab/ef'       "$no"
check 'ab[c/d]ef'        'ab[c/d]ef'   "$yes"
check 'ab[c/d]ef'        'abcef'       "$no"
check 'ab[.-/]ef'        'ab[.-/]ef'   "$yes"
check 'ab[.-/]ef'        'ab.ef'       "$no"
check 'ab[[=/=]]ef'      'ab[[=/=]]ef' "$yes"
check 'ab[[=/=]]ef'      'ab/ef'       "$no"
check 'ab[[=c=]/]ef'     'ab[=/]ef'    "$yes"
check 'ab[[=c=]/]ef'     'abcef'       "$no"
check 'ab[[:alpha:]/]ef' 'ab[:/]ef'    "$yes"
check 'ab[[:alpha:]/]ef' 'abxef'       "$no"
check 'ab[/[abc]]ef'     'ab[/c]ef'    "$yes"
check 'ab[/[abc]]ef'     'abc]ef'      "$no"
check 'ab[c[=/=]]ef'     'ab[c[=/=]]ef' "$yes"
check 'ab[c[=/=]]ef'     'abc[=/=]ef'   "$no"
check 'ab[c[=/=]]ef'     'abcef'        "$no"

echo '---Tests for incomplete bracket expressions---'
check 'ab[c'             'ab[c'         "$yes"
check 'ab[c'             'abc'          "$no"
check 'ab[c[=d='         'ab[c[=d='     "$yes"
check 'ab[c[=d='         'abc'          "$no"
check 'ab[c[.d'          'ab[c[.d'      "$yes"
check 'ab[c[.d'          'abc'          "$no"
check 'ab[c[:alpha:'     'ab[c[:alpha:' "$yes"
check 'ab[c[:alpha:'     'abc'          "$no"
check 'ab[c-'            'ab[c-'        "$yes"
check 'ab[c-'            'abc'          "$no"
check 'ab[c\'            'ab[c\'        "$yes"
check 'ab[c\'            'abc'          "$no"
check 'ab[[\'            'ab[[\'        "$yes"
check 'ab[[\'            'ab['          "$no"
From 1bb9d9e7258c915b3db1ed60722a6d93b3b3875e Mon Sep 17 00:00:00 2001
From: Koichi Murase <myoga.mur...@gmail.com>
Date: Thu, 17 Nov 2022 18:55:38 +0900
Subject: [PATCH 2/3] fix(BRACKMATCH): match immature bracket expression with
 literal [

---
 lib/glob/sm_loop.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/lib/glob/sm_loop.c b/lib/glob/sm_loop.c
index f4d8d1f2..2c15cae1 100644
--- a/lib/glob/sm_loop.c
+++ b/lib/glob/sm_loop.c
@@ -571,7 +571,7 @@ BRACKMATCH (p, test, flags)
       if (!(flags & FNM_NOESCAPE) && c == L('\\'))
        {
          if (ISBRACKETEOF (*p))
-           return (CHAR *)0;
+           return ((test == L('[')) ? savep : (CHAR *)0);
          cstart = cend = *p++;
        }
 
@@ -601,7 +601,7 @@ BRACKMATCH (p, test, flags)
          if (!(flags & FNM_NOESCAPE) && cend == L('\\'))
            cend = *p++;
          if (ISBRACKETEOF (cend))
-           return (CHAR *)0;
+           return ((test == L('[')) ? savep : (CHAR *)0);
          if (cend == L('[') && *p == L('.'))
            {
              p = PARSE_COLLSYM (p, &pc);
@@ -683,7 +683,7 @@ matched:
       else if (!(flags & FNM_NOESCAPE) && c == L('\\'))
        {
          if (ISBRACKETEOF (*p))
-           return (CHAR *)0;
+           return ((test == L('[')) ? savep : (CHAR *)0);
          /* XXX 1003.2d11 is unclear if this is right. */
          ++p;
        }
-- 
2.37.2

Reply via email to