Hi. Please see the thread starting at
https://lists.gnu.org/archive/html/bug-gawk/2021-07/msg00026.html The regexp used there, ".^", to my mind should be treated as invalid. Mawk does so, reading the entire file as one record. Gawk matches a newline for it: $ cat data a.^b a.^b $ cat x.awk BEGIN { RS = ".^" } { gsub(/.^/, ">&<") print NR, $0 print "RT=<" RT ">" } $ mawk -f x.awk data 1 a.^b a.^b RT=<> $ ./gawk -f x.awk data 1 a.^b RT=< > 2 a.^b RT=< > To make debugging easier, there is a test program in the gawk git repo that just does regexp matching the way gawk does, called testdfa. To use it, git clone git://git.savannah.gnu.org/gawk.git cd gawk ./bootstrap && ./configure ## edit Makefile and support/Makefile to remove -O, add -g make -j cd helpers gcc -g -I.. -I../support testdfa.c ../support/libsupport.a -o testdfa When run: $ cd helpers $ ./testdfa -b '.^' < ../data Ignorecase: false Syntax: RE_BACKSLASH_ESCAPE_IN_LISTS|RE_CHAR_CLASSES|RE_CONTEXT_INDEP_ANCHORS|RE_DOT_NEWLINE|RE_INTERVALS|RE_NO_BK_BRACES|RE_NO_BK_PARENS|RE_NO_BK_VBAR|RE_NO_EMPTY_RANGES|RE_UNMATCHED_RIGHT_PAREN_ORD|RE_INVALID_INTERVAL_ORD Pattern: /.^/, len = 2 After setup_pattern(), len = 2 MB_CUR_MAX = 6 Calling dfacomp(.^, 2, 0x55e9d56a5600, true) re_search returned position 4 (true) dfaexec returned 5 (a.^) If this is supposed to match a newline, I'd like to understand why. If it's not, I'd like to get a fix for regexp and dfa. Or if RE_SYNTAX_GNU_AWK needs more or fewer syntax bits[1], I'd like to know which, and why. Please cc me on any and all replies, as I'm not subscribed to this list. Thanks, Arnold [1] I hate the syntax bits. I have hated them for decades. Sigh.