2022年11月18日(金) 2:11 Chet Ramey <chet.ra...@case.edu>:
> "If a pattern ends with an unescaped <backslash>, it is unspecified whether
> the pattern does not match anything or the pattern is treated as invalid."
>
> Bash uses the former interpretation. If "the pattern is treated as invalid"
> means trying to literally match the open bracket and going on from there,
> your interpretation is valid as well. The standard doesn't use that
> language in other places it specifies to treat the bracket as an ordinary
> character to be matched literally, however.

There seem to be still remaining issues.  It is fine for me if Bash
chooses the former, ``the pattern does not match anything'' with a
backslash followed by NUL, but the following cases (see the attached
[reduced3.sh]) with a backslash followed by a slash should still be
fixed:

  #1: pat=a[b\/c]          str=a[b/c]           no/yes
  #2: pat=a[b\/c]          str=ab               no/no
  #3: pat=a[b\/c]          str=ac               yes/no
  [...]

Where the fourth column <xxx/yyy> shows the result of the current
devel 407d9afc with FNM_PATHNAME (xxx) and the result I expect
(yyy). "yes" means the pattern matches the string, and "no" means the
pattern does not match.

* I expect "yes" for #1 because the bracket expression contains a
  slash before its closing right bracket `]' and thus the beginning
  `[' should be matched literally.  However, the actual behavior is
  "no".

* I expect "no" for both #2 and #3 because the beginning bracket `['
  should be matched literally.  Even when an escaped slash would be
  allowed in the bracket expression so that [b\/c] forms a complete
  bracket expression, the results of #2 and #3 being "no" and "yes",
  respectively, are inconsistent.

  This difference is caused because the slash after the backslash is
  only checked after a matching character is found
  (lib/glob/sm_loop.c:703).  The same check should be applied also
  before a matching character is found (lib/glob/sm_loop.c:573).  I
  attach a patch for this [r0037.brackmatch6.remaining-slash.patch].

----------------------------------------------------------------------

There is another related inconsistency.  I just modified my new
extglob engine to follow Bash's choice described above, but then the
behavior became different from that of the actual implementation of
Bash of the current devel.

> "If a pattern ends with an unescaped <backslash>, it is unspecified whether
> the pattern does not match anything or the pattern is treated as invalid."
>
> Bash uses the former interpretation.

The corresponding sentence in the POSIX standard describes the
unescaped backslashes in the general context of the pattern instead of
that in the bracket expression, so I applied this to the new extglob
engine.  However, ``the former interpretation'' that Bash adopts
turned out to be only applied to the unescaped backslashes *inside a
bracket expression*.  This is the remaining part of the output of the
attached [example3.sh] with the current devel 407d9afc:

  [...]
  #4: pat=a\               str=a\               yes/???

So the pattern terminated with unescaped backslash actually matches a
string, where the backslash is treated as a literally-matching
backslash.

a. Is this difference between outside and inside of the bracket
  expressions intensional? I.e., the former interpretation "the
  pattern does not match anything" seems to only apply to the inside
  of bracket expressions.

b. If this is the behavior for the unescaped backslashes outside the
  bracket expressions, which is intensionally different from those in
  the bracket expressions, would it be possible to change the
  treatment of the unescaped backslashes inside the bracket
  expression the same as that of outside so the bracket `[' matches
  literally (as expected in cases #28..#31 of my previous reply [1])?
  The attached [r0037.brackmatch7.unescaped-backslash-option-b.patch]
  is the corresponding patch.

  [1] https://lists.gnu.org/archive/html/bug-bash/2022-11/msg00070.html

c. If the behavior of the unescaped backslash of the outside should
  also be modified to follow the former interpretation "the pattern
  does not match anything", another patch is
  [r0037.brackmatch7.unescaped-backslash-option-c.patch].  However,
  the current behavior outside the bracket expression seems to be
  explicitly required by the tests on tests/glob2.sub:32 and
  tests/glob2.sub:41.

I prefer option b, which keeps the behavior required by
tests/glob2.sub and also consistent between the inside and the outside
of bracket expressions.  It is also consistent with the behavior for
the string end inside bracket expressions.

--
Koichi
From 828b93de72263785d93f86a285d919fdc5be156d Mon Sep 17 00:00:00 2001
From: Koichi Murase <myoga.mur...@gmail.com>
Date: Sun, 20 Nov 2022 16:09:09 +0900
Subject: [PATCH 1/2] fix(BRACKMATCH): fix remaining slash check

---
 lib/glob/sm_loop.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/lib/glob/sm_loop.c b/lib/glob/sm_loop.c
index fa350daa..151d10bd 100644
--- a/lib/glob/sm_loop.c
+++ b/lib/glob/sm_loop.c
@@ -570,6 +570,8 @@ BRACKMATCH (p, test, flags)
        {
          if (*p == '\0')
            return (CHAR *)0;
+         else if (*p == L('/') && (flags & FNM_PATHNAME))
+           return ((test == L('[')) ? savep : (CHAR *)0);
          cstart = cend = *p++;
        }
 
-- 
2.37.2

#!/usr/bin/env bash

LC_COLLATE=C

gcc -O2 -xc -o ./fnmatch - <<-EOF
	#include <fnmatch.h>
	#include <stdlib.h>
	#include <stdio.h>

	int main(int argc, char **argv) {
	  if (2 >= argc) {
	    fprintf(stderr, "usage: fnmatch string pattern\n");
	    exit(2);
	  }

	  int flags = FNM_PATHNAME | FNM_PERIOD | FNM_EXTMATCH;
	  if (fnmatch(argv[2], argv[1], flags) == 0)
	    return 0;
	  return 1;
	}
EOF

gcc -O2 -shared -xc -o ./strmatch.so - <<-EOF
	#define BUILTIN_ENABLED 0x01
	struct word_desc { char* word; int flags; };
	struct word_list { struct word_list* next; struct word_desc* word; };
	struct builtin {
	  const char* name;
	  int (*function)(struct word_list*);
	  int flags;
	  const char** long_doc;
	  const char* short_doc;
	  char* handle;
	};

	/*#include <glob/strmatch.h>*/
	int strmatch(char *pattern, char *string, int flags);
	#define FNM_PATHNAME    (1 << 0)
	#define FNM_NOESCAPE    (1 << 1)
	#define FNM_PERIOD      (1 << 2)
	#define FNM_LEADING_DIR (1 << 3)
	#define FNM_CASEFOLD    (1 << 4)
	#define FNM_EXTMATCH    (1 << 5)
	#define FNM_FIRSTCHAR   (1 << 6)
	#define FNM_DOTDOT      (1 << 7)

	static int strmatch_builtin(struct word_list* list) {
	  char *str, *pat;
	  if (!list || !list->word) return 2;
	  str = list->word->word;
	  if (!list->next || !list->next->word) return 2;
	  pat = list->next->word->word;

	  if (strmatch (pat, str, FNM_PATHNAME | FNM_PERIOD | FNM_EXTMATCH) == 0)
	    return 0;
	  return 1;
	}
	static const char* strmatch_doc[] = { "This is a builtin to test the behavior of strmatch", 0 };
	struct builtin strmatch_struct = { "strmatch", strmatch_builtin, BUILTIN_ENABLED, strmatch_doc, "strmatch string pattern", 0, };
EOF

enable -f ./strmatch.so strmatch

check_count=1
yes=$'\e[32myes\e[m'
no=$'\e[31mno\e[m'

function check {
  # bash impl
  if strmatch "$2" "$1"; then
    local strmatch=$yes
  else
    local strmatch=$no
  fi

  # fnmatch
  local expect=${3-}
  if [[ ! $expect ]]; then
    if ./fnmatch "$2" "$1"; then
      expect=$yes
    else
      expect=$no
    fi
  fi
  printf '#%d: pat=%-16s str=%-16s %s/%s\n' "$((check_count++))" "$1" "$2" "$strmatch" "$expect"
}

check 'a[b\/c]'          'a[b/c]'      "$yes"
check 'a[b\/c]'          'ab'          "$no"
check 'a[b\/c]'          'ac'          "$no"
check 'a\'               'a\'          '???'
From 69b70f4581bae1ec5a74ca2f38fc95b7a29d1e2c Mon Sep 17 00:00:00 2001
From: Koichi Murase <myoga.mur...@gmail.com>
Date: Sun, 20 Nov 2022 18:55:54 +0900
Subject: [PATCH 2/2] fix(BRACKMATCH): option (b) for inconsistent unescaped
 backslash

---
 lib/glob/sm_loop.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/glob/sm_loop.c b/lib/glob/sm_loop.c
index 151d10bd..c46f8231 100644
--- a/lib/glob/sm_loop.c
+++ b/lib/glob/sm_loop.c
@@ -569,7 +569,7 @@ BRACKMATCH (p, test, flags)
       if (!(flags & FNM_NOESCAPE) && c == L('\\'))
        {
          if (*p == '\0')
-           return (CHAR *)0;
+           return ((test == L('[')) ? savep : (CHAR *)0);
          else if (*p == L('/') && (flags & FNM_PATHNAME))
            return ((test == L('[')) ? savep : (CHAR *)0);
          cstart = cend = *p++;
@@ -701,7 +701,7 @@ matched:
       else if (!(flags & FNM_NOESCAPE) && c == L('\\'))
        {
          if (*p == '\0')
-           return (CHAR *)0;
+           return ((test == L('[')) ? savep : (CHAR *)0);
          /* We don't allow backslash to quote slash if we're matching 
pathnames */
          else if (*p == L('/') && (flags & FNM_PATHNAME))
            return ((test == L('[')) ? savep : (CHAR *)0);
-- 
2.37.2

From 610d10fc663fd172ae69f8eac8e18a0b426103fe Mon Sep 17 00:00:00 2001
From: Koichi Murase <myoga.mur...@gmail.com>
Date: Sun, 20 Nov 2022 18:46:42 +0900
Subject: [PATCH] fix(GMATCH): option (c) for inconsistent unescaped backslash

---
 lib/glob/sm_loop.c | 3 ---
 tests/glob2.sub    | 6 +++---
 2 files changed, 3 insertions(+), 6 deletions(-)

diff --git a/lib/glob/sm_loop.c b/lib/glob/sm_loop.c
index fa350daa..e058b0e8 100644
--- a/lib/glob/sm_loop.c
+++ b/lib/glob/sm_loop.c
@@ -126,9 +126,6 @@ fprintf(stderr, "gmatch: pattern = %s; pe = %s\n", pattern, 
pe);
          break;
 
        case L('\\'):           /* backslash escape removes special meaning */
-         if (p == pe && sc == '\\' && (n+1 == se))
-           break;
-
          if (p == pe)
            return FNM_NOMATCH;
 
diff --git a/tests/glob2.sub b/tests/glob2.sub
index 09cb6d51..0569bcc8 100644
--- a/tests/glob2.sub
+++ b/tests/glob2.sub
@@ -29,8 +29,8 @@ ab\\) echo ok 1;;
 esac
 
 case $var in
-$var)  echo ok 2;;
-*)     echo bad 2;;
+$var)  echo bad 2;;
+*)     echo ok 2;;
 esac
 
 case $var in
@@ -38,7 +38,7 @@ case $var in
 *)     echo bad 3;;
 esac
 
-[[ $var = $var ]] && echo ok 4
+[[ $var = $var ]] || echo ok 4
 [[ $var = $'ab\134' ]] && echo ok 5
 
 LC_ALL=zh_HK.big5hkscs
-- 
2.37.2

Reply via email to