Note that I said EREs, which don't have to provide backreferences.
Arnold
Paul Eggert wrote:
> On 3/20/24 01:40, arn...@skeeve.com wrote:
> > It's possible to write a POSIX compliant matcher for EREs that doesn't
> > have such problems; I know someone doing
> Bruno
>
> [1] https://blog.cloudflare.com/cloudflare-outage/
> [2] https://en.wikipedia.org/wiki/RE2_(software)
It's possible to write a POSIX compliant matcher for EREs that doesn't
have such problems; I know someone doing it. In any case, users
get what they ask for, it's up to them to understand what they're doing.
Thanks,
Arnold
>
Hello.
Please see this report sent to the gawk list concerning regcomp.c.
I have attached his "POCFILE".
Thanks,
Arnold
> From: ttfish
> Date: Tue, 19 Mar 2024 21:48:34 +0800
> Subject: Segmentation Fault via recursive loop in Gawk
> To: bug-g...@gnu.org
> Cc: sec
Hi.
Bruno Haible wrote:
> Arnold: I have added '#if GAWK' conditionals, knowing that gawk's build system
> does not use gnulib-tool and you therefore pull from gnulib manually. This
> means the improvements will not land in gawk, since dfa in gawk will continue
> t
Hi.
Thanks for all this. I will review the changes and integrate
them as works for me.
I appreciate the help.
Arnold
Paul Eggert wrote:
> On 2023-04-30 11:28, Aharon Robbins wrote:
> > This would seem to be due to the expansion of the INT_MULTIPLY_WRAPV
> > macro. I trie
Thanks for the update.
Paul, let's leave dfa.c as is, with the modified code.
It's much easier to read anyway.
Thanks,
Arnold
Arsen Arsenović wrote:
> Hi,
>
> Paul Eggert writes:
>
> > This is a serious bug in Clang: it generates incorrect machine code.
> >
/html/bug-gawk/2022-12/msg00010.html.
This still smells like "compiler bug" to me, but even if not,
the GNULIB folks need to look at it.
I will take a look at testdfa; it's been a while since I've had to
use it, so maybe something has gotten out of sync.
Thanks,
Arnold
Sam
dern linux
./make-tmp.sh # build and install under /tmp/pcc
export PATH=/tmp/pcc/bin:$PATH
Next, clone the gawk repo and then
cd gawk
./bootstrap.sh && ./configure CC=pcc
make
I get:
| $ make
| make all-recursive
| make[1]: Entering directory
einfo.c, a static_assert or two are used instead of the previous
verify() macro. Gawk sticks to c99 compatibility to support VMS which
only has a C99 front end on the compiler.
Can all of the above be reverted in gnulib please?
Thanks,
Arnold
t. I'll add that to my (long) list of things to do.
OK - I agree that getting this into glibc is higher priority.
Thanks,
Arnold
Hi Paul.
Thanks for this. The patch looks good. I will (eventually) merge it
into gawk instead of my change.
I plan to add a test to gawk; perhaps grep would benefit from one as well?
Thanks,
Arnold
Paul Eggert wrote:
> On 4/21/22 00:57, Arnold Robbins wrote:
>
> > As far a
it's a realistic worry.
My two cents of course. I have already pushed this change in gawk's
copy of regex.
Thanks,
Arnold
Greetings.
Way back in May of 2015, Nelson Beebe submitted the following
bug report for gawk:
> Date: Mon, 25 May 2015 14:21:04 -0600 (MDT)
> From: "Nelson H. F. Beebe"
> To: "Arnold Robbins"
> Cc: be...@math.utah.edu
> Subject: gawk-4.1.3 regexp error
>
has rounding issues on
> 'float' but that doesn't mean we need to worry about +'s rounding issues
> on 'int'.
I don't care for the diff. It's a lot more work than changing & to &&,
but as I have my own copy of dfa.c I won't worry about it.
Thanks,
Arnold
tests; the short-circuit nature of && is less important.
In addition, & for a logical test can be dangerous since any non-zero
value can be true. Even though you're using bool functions, &&
guarantees a logical true/false instead of an accidental one.
Thanks,
Arnold
nuth said,
"Premature optimization is the root of all evil." I think that
applies here.
I (at least) request that you make the change in dfa.c.
Thanks,
Arnold
Hi.
Thanks for the report. I am cc'ing the GNULIB guys, as they
are the upstream for dfa.c. In the meantime, I will make this
change in gawk.
Thanks!
Arnold
David Binderman wrote:
> Hello there,
>
> I just tried to compile gawk-5.1.1 with new clang-14. It said
>
> df
Paul Eggert wrote:
> On 8/29/21 8:55 AM, arn...@skeeve.com wrote:
> > Hi.
> >
> > I had to make the change below to dfa.h to get things to compile
> > in gawk. Please apply this.
>
> Sorry about my typo, and thanks for the fix; I installed it.
No problem. Much thanks.
Arnold
Hi.
I had to make the change below to dfa.h to get things to compile
in gawk. Please apply this.
Thanks,
Arnold
-
--- /usr/local/src/Gnu/gnulib/lib/dfa.h 2021-08-27 16:50:39.579581132 +0300
+++ support/dfa.h 2021-08-29 18:30:25.101719167 +0300
@@ -50,6
ectly but breaks the gawk test suite.
In other words, I think the bug is somewhere in this area, but I
don't understand the regex internals enough to fix it. dfa will also
need looking at.
Thanks,
Arnold
Bruno Haible wrote:
> Hi Arnold,
>
> > Dot matching newline isn't the issue here.
> >
> > It's ^ matching in the middle of a string. For my purposes, ^ should
> > only match at the beginning of a *string* (as $ should only match at
> > the end of a
Hi.
Paul Eggert wrote:
> On 7/15/21 1:48 PM, Arnold Robbins wrote:
> > The regexp used there, ".^", to my mind should be treated as invalid.
>
> No, that regular expression is valid because "." matches newline in
> POSIX EREs. So the "." matches
turned 5 (a.^)
If this is supposed to match a newline, I'd like to understand why.
If it's not, I'd like to get a fix for regexp and dfa. Or if
RE_SYNTAX_GNU_AWK needs more or fewer syntax bits[1], I'd like to
know which, and why.
Please cc me on any and all replies, as I'm not subscribed to
this list.
Thanks,
Arnold
[1] I hate the syntax bits. I have hated them for decades. Sigh.
worry about the older clang and gcc versions
> that complain about {0} as an initializer. We can either let them die
> off noisily, or use the appropriate -Wno-whatever option when using them
> to compile.
I've decided to just not worry about it. It's impossible to compile
without warnings on every single C compiler in the world.
Thanks,
Arnold
Hi.
I got the below from one of my testers. If y'all feel like updating the
relevant files in GNULIB, that'd be great. If instead you feel like,
well, to heck with that, that's also OK. :-)
Thanks,
Arnold
> From: Pat Rankin
> Date: Mon, 10 May 2021 18:13:33 -0700
>
&g
Paul Eggert wrote:
> On 5/6/21 11:23 PM, arn...@skeeve.com wrote:
> > I'd prefer to see it fixed upstream...
>
> It was fixed upstream a couple of weeks ago. You should be able to fix
> the Gawk issue by syncing Gawk from Gnulib.
Thanks, will do.
Arnold
| sort | uniq
>
> Then, for each file in the list:
>
> sed -e 's/__nonnull ((1))//g' \
> -e 's/__nonnull ((1, 2))//g' \
> "${file}" > "${file}.fixed"
> mv "${file}.fixed" "${file}"
>
> (You will need it for more than Awk. Wget and a couple of others need
> the treatment, too).
>
> Jeff
Thanks. I can do this, but I'd prefer to see it fixed upstream...
Arnold
ulib is updated?
Much thanks,
Arnold
> Date: Thu, 6 May 2021 15:15:08 -0600
> From: "Nelson H. F. Beebe"
> Subject: Re: Release spiral start
>
> I got successful builds and installations of gawk-5.1.1a on several
> systems, include CentOS 5/6/7/8, Ubuntu 20.04, Debian
I have pushed fixes for this. Let me know if there are still issues.
Thanks,
Arnold
arn...@skeeve.com wrote:
> "Dmitry V. Levin" wrote:
>
> > On Sat, Apr 17, 2021 at 01:43:58PM -0600, arn...@skeeve.com wrote:
> > > "Dmitry V. Levin" wrote:
> >
e problem is that __libc_dynarray_resize is actually not linked
into gawk, but the executable runs because the local libc happens to
supply the symbol. But since it's "private" to GLIBC, that symbol
being there can't be relied upon.
OK --- I will work on this.
Thanks,
Arnold
ay_resize@GLIBC_PRIVATE
> This makes it unusable at least in GNU/Linux distributions.
Can you explain how this makes it unusable? I see this on Ubuntu
but the gawk executables run just fine.
What, really, is the problem here? I don't understand.
Thanks,
Arnold
ion, e.g.
> gnulib-tool script, like many other gnulib users do, that would make
> updating gnulib modules a relatively straightforward task.
Sorry to disappoint you, but I prefer to keep my project such
that the support infrastructure doesn't overwhelm the actual
project code.
Arnol
s issue too. So the files
are back in sync. Whew!
Thanks,
Arnold
.
That helped a lot.
I still have to have the following change, otherwise I get a linkage
error on the gl_dyanarray_* routines. :-(
So, at least for the nonce, my copy and Gnulib's will be out of sync.
Oh well.
Thanks,
Arnold
-
--- /usr/local/src/Gnu/gnulib/lib/regex_
binlydUstJOzd.bin
Description: Binary data
eally needed?
Running gnulibtool on gawk isn't the direction I want to go, either...
Thanks,
Arnold
Thanks.
A few times a week I 'git pull' to see if anything has changed
that affects gawk.
Arnold
Bruno Haible wrote:
> Hi Arnold,
>
> > Please revert this, as it breaks compilation in gawk.
>
> This patch should do it (keeping the optimized variant of the 3-way
x > q->index;
| + return _GL_CMP (p->index, q->index);
| }
Please revert this, as it breaks compilation in gawk.
Thanks,
Arnold
Just FYI, gawk's dfa.c is now in sync w/Gnulib's.
There are still some problems on Vax/VMS. I suspect it's environmental
but will let you know if not.
Thanks!
Arnold
arn...@skeeve.com wrote:
> Paul,
>
> Thanks for this. I will work on reducing the differences betwee
Paul,
Thanks for this. I will work on reducing the differences between
what's in Gnulib and what's in gawk.
Vax/VMS is dead as a commercial system, true. But it remains alive as
a hobbyist system, especially as it's very easy to run in simulation
under SIMH.
Thanks!
Arnold
Pau
may not be a blocker, but even if not, disabling use of dfa.c for
regular expression matching means that gawk will run slower on that
system.
Can dfa.c be made 32-bit compatibile in a happy fashion?
Thanks,
Arnold
-h module. This second patch
> shouldn't affect Awk.
Much thanks for the fix. I have pulled it into gawk and we'll see
what my testers report.
Thanks,
Arnold
this fails spectactularly. :-(
Can you revert to the original code or to something else that
will compile on systems where ULONG_WIDTH is not defined?
Much thanks,
Arnold
HANK YOU for the quick turnaround time on the fix. I appreciate it.
Arnold
e?
Thanks,
Arnold
other places. This is
> why I used the more-generic name "idx_t" internally dfa.c.
I give up. Leave it ptrdiff_t. I may submit comment changes for dfa.h
later.
Arnold
Paul Eggert wrote:
> On 12/15/19 10:43 AM, Arnold Robbins wrote:
> > To reproduce:
> >
> > 1. Checkout the gawk repo
> > 2. Copy gnulib/lib/dfa.[ch] into gawk/support/.
> > 3. Apply the minimal patch below
>
> I looked into that, and the problem was no
le to merge from gnulib until the dfa crashes are
dealt with, though. IMHO those are very high priority.
Thanks,
Arnold
Paul Eggert wrote:
> On 12/15/19 12:14 AM, arn...@skeeve.com wrote:
>
> > int64_t is just as standard as ptrdiff_t and just as clear.
>
> Actually, int64_t is optional (as even C18 and POSIX-2018 do not require it),
> whereas ptrdiff_t has been required since C89. More importantly, int64_t would
You'll see lots of the tests blowing up spectacularly.
Please repair things.
Thanks,
Arnold
diff --git a/lib/dfa.c b/lib/dfa.c
index 8c88c9d..818f58f 100644
--- a/lib/dfa.c
+++ b/lib/dfa.c
@@ -890,6 +890,23 @@ char_context (struct dfa const *dfa, u
ld rather
see an additional `char *locale_name' parameter added to dfa_syntax.
That way the caller can get the value and pass it in, and the
dfa code becomes mt-safe at next to no cost.
Thanks,
Arnold
pull in more libraries, like libpthread
or whatever, since that would considerably complicate things for me,
for no actual gain w.r.t. gawk.
I'm curious what is the use case for multithreaded dfa?
Thanks,
Arnold
.
Thanks,
Arnold
Paul Eggert wrote:
> >> I see that Paul has made the change to the API over my objections.
>
> I made the change while responding to Bruno's objections, but before
> seeing yours. Ooops. Sorry about that. However, I hope the followup
> emails have a
arn...@skeeve.com wrote:
> But I really don't want ptrdiff_t in the API.
I see that Paul has made the change to the API over my objections.
Jim --- do you have an opinion on this?
Thanks,
Arnold
ing ssize_t.
In any case, as I said, I can live with ptrdiff_t in the implementation,
even though I don't like it that much. (A nice block comment at the
top of dfa.c explaining why ptrdiff_t is used would be appropriate.)
But I really don't want ptrdiff_t in the API.
Thanks,
Arnold
Thanks,
Arnold
arn...@skeeve.com wrote:
> Other than this, I think internally too, I'd prefer that you
>
> 1,$s/ptrdiff_t/ssize_t/g
I did this, just to see. gawk passes its test suite, both in
64- and 32-bit mode.
FWIW.
Thanks,
Arnold
the API.
Thanks!
Arnold
then I could
live with ssize_t (as returned by read(2), for example), but I
would find ptrdiff_t to be ugly and unintuitive.
> PS. Arnold, the above discusses all the changes I know about for dfa.c
> and dfa.h. The proposed API change (size_t->ptrdiff_t) could be
> installed eit
hy not?
The more environments that I can easily support out of the box, the
better for my users.
My two cents,
Arnold
Hi Paul.
Much thanks! I have pulled in the changes to gawk and pushed to git.
I await news from the original reporter (Hi Peter!) as to whether that
does the trick on msys.
W.R.T. your question about config.guess, perhaps Alexey can answer.
Thanks!
Arnold
Paul Eggert wrote:
> On 11/9/19
e small enough that you don't need paperwork.
I'm cc-ing Alexey Pawlow, the author of the changes.
Thanks!
Arnold
Hi.
Norihiro Tanaka wrote:
> Missing a patch for dfa. Re-send correct patch file.
Paul - is this going to be merged into GNULIB? If so, I'll put it into
gawk now; I want to make a release soon.
Thanks,
Arnold
[
w.
> I didn't mention the required NEWS update either.
Indeed, I knew I was requesting the obvious, but sometime people forget ...
In any case, much thanks!
Arnold
PCRE, a different matcher), or
> - don't use the C locale, but rather use a multi-byte locale like the
> one you chose, which inhibits use of the DFA matcher, because \b's
> definition requires multi-byte aware machinery not present in the DFA
> matcher.
>
> I expect to revert the mentioned mentioned gnulib commits, and then to
> make new releases of both grep and sed.
Please add a test case ...
THanks,
Arnold
de will need to be changed
> accordingly, and if so I can volunteer to coordinate that with glibc
> (we're close to a freeze in Glibc, but we can install into Gnulib first).
>
I assume you'll make parallel changes in dfa.c at the same time?
Thanks,
Arnold
> the code unmercifully.)
>
> I agree. I don't think they make much performance difference nowadays. I
> plan to time them and see if we're right; if so, let's get rid of them
> (in glibc regex, Gnulib, and in Gawk).
So, let's wait until the results of all this. Once you update regex
in Gnulib I will sync with it.
Thanks,
Arnold
ctly in
regex_internal.h.
Thanks,
Arnold
really a gnulib
issue and thus I'm reporting it there.
I will apply this patch, probably later this week, unless the GNULIB
guys, with your help, can patch regex directly.
Thanks,
Arnold
> Date: Sat, 29 Sep 2018 16:05:35 -0600
> From: "Nelson H. F. Beebe"
> To: &q
Paul Eggert wrote:
> Thanks, I installed that into Gnulib and into glibc.
Most excellent! Thanks.
Hi.
I have applied the following patch to my copy of regex_internal.h; it's
needed for compilation in the POSIX environment on z/OS.
Thanks,
Arnold
--
--- /usr/local/src/Gnu/gnulib/lib/regex_internal.h 2018-07-18
21:16:31.670542200 +0300
+++ su
t; time
> Gawk syncs mktime.c from Gnulib it'll fix the problem then.
I have pulled the latest into gawk-4.2-stable; waiting to hear from
my porting team.
Thanks.
Arnold
Hi.
Thanks for the note. The file under discussion came from GNULIB (I
believe) so I'm adding bug-gnulib and will let that team comment on
this.
Given that it's not an issue on commonly used CPUs, I don't see this
as a high priority issue either way.
Thanks,
Arnold
Samy
9, down from 0.09% a year ago (and
> down
> from 0.7% in 2010, so this is a factor-of-10 decline in 8 years).
I totally agree that it's not worth worrying about. It's a too small
tail to be wagging such a big dog.
Thanks,
Arnold
l. I'd be surprised if RRI were fully implemented even in
> the
> !_LIBC part of the code.
The only FIXMEs I see are both in the _LIBC part of the code, and
there's only two: one in regexec.c and one in regcomp.c.
THanks,
Arnold
rom gawk, I
am pretty sure that the ! _LIBC part of the code does get it right.
Or at least did in my version.
Thanks,
Arnold
Hi.
Paul Eggert wrote:
> Rather than spend much time worring about this little comment, it'd
> probably be more helpful to document the intended behavior of rational
> ranges. As I understand it, Arnold wants them to use byte values in
> unibyte locales and wide character va
likely merge that into the gawk mainline this week.
Arnold
Hi Paul.
Paul Eggert wrote:
> On 12/08/2017 01:16 AM in
> <https://sourceware.org/ml/libc-alpha/2017-12/msg00242.html> Arnold
> Robbins wrote:
> > + /* some malloc()-checkers don't like zero allocations */
>
> Which checkers are these?
Lord only knows. That
Thanks, I have merged this in to gawk's version.
Paul --- I think that you have permission to push the patches you approve
to glibc. Please do so.
Thanks,
Arnold
Paul Eggert wrote:
> On 12/08/2017 01:16 AM, Arnold Robbins wrote:
> > This patch changes several calls to ma
he original email is here:
> > https://sourceware.org/ml/libc-alpha/2017-12/msg00243.html
>
> Done. I've pushed these for Arnold.
>
> commit 5069ff32842c60c55f8b573ee66fe43f9ec364af
> Author: Arnold Robbins
> Date: Tue Dec 19 19:26:08 2017 -0800
>
> regex: Fix spelling in
Hello.
Thanks for cluing me into the discussion.
> As I understand it, Arnold's patches are against glibc. Arnold, would it
> be too much trouble to rebase them against gnulib instead?
Absolutely too much trouble. Sorry.
I think that most or all of the changes are in gnulib'
Paul Eggert wrote:
> On 12/13/2016 12:26 PM, Arnold Robbins wrote:
> > - dfa->syntax.case_fold = (dfaopts & DFA_CASE_FOLD) != 0;
> > + dfa->syntax.case_fold = (bits & RE_ICASE) != 0
>
> I'm afraid that didn't work, due to a missing semicolon
Bruno Haible wrote:
> Finally, code this formula into the 'grep' program.
I'm sure that Paul and Jim would welcome patches.
Arnold
83 matches
Mail list logo