Anyway, thanks a bunch for responding; I was worried that no one would.
*From:*Philip Race <philip.r...@oracle.com>
*Sent:* Wednesday, May 22, 2024 11:54 AM
*To:* Yagnatinsky, Mark : IT (NYK) <mark.yagnatin...@barclays.com>;
core-libs-dev@openjdk.org
*Subject:* Re: stack overflow in regex engine
CAUTION: This email originated from outside our organisation -
philip.r...@oracle.com Do not click on links, open attachments, or
respond unless you recognize the sender and can validate the content
is safe.
P4 is the default JBS priority, so sometimes it just means no one
figured out the true priority.
But in general P4 bugs could be open for years, or even never get fixed.
The priority is also partially an assessment of where it falls as a
priority for the JDK developers.
A user of JDK may have an entirely different perspective.
And that's why there are vendors who provide support for JDK. They can
also arrange the backports you need.
But that's not done here. Here is where you come to participate and
contribute fixes, not ask for fixes.
So my suggestion is to raise it via your support channel to your
particular vendor who provided your binary.
-phil
On 5/21/24 8:46 PM, mark.yagnatin...@barclays.com wrote:
(Sorry about my previous “do I need to subscribe?” email; in
retrospect that was needless noise.)
The purpose of this email is twofold: first, inquire about the
status of ticket filed a few years ago, and second to point out
some non-obvious reasons why it might be slightly more serious
than it seems.
The ticket is this one https://bugs.openjdk.org/browse/JDK-8260866
<https://clicktime.symantec.com/15t5ekSGXorRH53n7q6GJ?h=e9ZmDJOAdCkeHz_PXjDgZiyUdvJmTZTTcGvZoAULMmE=&u=https://bugs.openjdk.org/browse/JDK-8260866>
(stack overflow in regex matching quantified alternation)
The priority is listed as P4, which I guess means something like
“medium” (more important than p5, but less than p3)
It also has a specific person assigned, which seems vaguely
encouraging, but no updates at all in the years since it’s been
created, which seems less encouraging.
It was seemingly never once discussed on this mailing list, not
even when it was first filed.
As an outsider, I’m not quite sure how to interpret all these
various omens and turn them into guesses about its eventual fate.
Will it remain unfixed for another decade or two? Will it be
fixed in a few months, but then never backported to old versions?
Something else? No one knows?
That concludes the status inquiry. Now on to the advocacy. Some
bugs are annoying, but once you hit them, you can work around them
by changing your code so it does not trigger the bug.
Note the phrase “your code” above. This is much more awkward to
do if the bug triggered by third-party code you got from maven
central or something.
At that point your options are to either ask the third party
library to work around it, or else fork the dependency (which is
not well supported by mainstream build tools (or maybe I’m just
using them wrong)).
In this case, regular expressions are so ubiquitous that the bug
is quite plausibly more likely to be triggered by some third party
dependency than by code you own.
That was the case for me today: after spending hours trying to
track down a stack overflow error I found the offending regex in a
third party library.
The good news is that for the kinds of inputs we need to handle,
it is indeed easy to substitute a much simpler regex that would
avoid the issue.
The bad news is that it’s not my code, so I can’t. I could
petition the maintainers of the library, but this is not great
because:
First, maybe the version I’m on is not longer even supported, and
newer versions are not compatible,
Second, it may take them a while to fix it, and third, it is
wasteful (and inelegant) to have workarounds slowly percolate
throughout the Java ecosystem instead of fixing the problem at the
root.
The other annoying thing here is that even when you have “enough”
stack space to avoid crashing, using it may not be quite “free”.
For instance, project loom’s foundational premise seems to be that
“most threads have oversized stacks; we can have more threads if
we start off with small stacks and grow them only when needed”.
This would be false when the thread in question uses a regex with
quantified alternation.
(Since many Loom threads will be based on the same Runnable, it’s
a pretty safe bet that if one of them uses this feature, many
will, so you can’t assume it will “average out”.)
There are other reasons besides loom to be low on stack space;
maybe you’re using some crazy framework(s) that like(s) to have
call stacks that are crazy deep.
Or maybe you’re running with -Xss set pretty low. Or you passed a
small value for stack space to the Thread constructor.
Or maybe none of these things are true, but in most operating
systems a thread stack costs “real” memory in proportion to its
high-watermark, so even a SINGLE heavy regex in the lifetime of a
thread is tantamount to a memory leak of hundreds of kilobytes.
Practicalities aside, I don’t like it when code consumes
“surprising” types of resources, or surprising amounts of them.
For instance, you wouldn’t expect a sorting function to spawn
threads behind your back, unless it was called “parallel sort” or
something like that.
You wouldn’t expect it to allocate multi-gigabyte arrays, nor to
perform I/O.
Similarly, most functions need only O(1) stack space, so this
tends to be the default assumption unless the docs explicitly call
out “this thing might throw stack overflows at you so make sure
you have plenty of stack space”
Some need a bit more… for instance, I would not be surprised if a
regex need stack space in proportion to the depth of the parse
tree of the regex.
But stack space in proportion to the length of the string being
matched is the kind of thing that I’d hope gets called out in
those @implNotes thingies, or better yet fixed.
Even people who know that regex matching can sometimes take
exponential time may naively assume that regex matching would not
consume O(n) stack space, where n is the input length.
What’s worse, not only does it indeed consume stack space linear
in the length of the input, but the constant hidden by the O()
notation is itself pretty scary.
For instance, consider the regex that caused my troubles today:
https://github.com/apache/camel/blob/main/core/camel-support/src/main/java/org/apache/camel/support/ObjectHelper.java#L63
<https://clicktime.symantec.com/15t5jadYzRY1h1shfPVQv?h=nT81oCo1qZ8nsQ8sI9SyBtH8DOuudlSAMaXkeKhYmgU=&u=https://github.com/apache/camel/blob/main/core/camel-support/src/main/java/org/apache/camel/support/ObjectHelper.java%23L63>
After getting rid of extra escaping and also double-escaping
caused by java not having “raw” strings, we’re left with this:
,(?!(?:[^(,]|[^)],[^)])+\))
(I find the above hard to read; the regex I would have replaced it
with, if it had been “our” code is simply a single comma.)
Anyway, I tried creating a Scanner with the delimiter above and
looping through all the tokens in the string that originally
caused the crash.
I thought that perhaps it would work, since I had a simple example
that does everything in main, but it also crashed.
Then I decided to play an alternating game where I trimmed the
string until it stopped crashing, then lowered Xss by 64k and
repeated.
Eventually, I got it crashing with a call stack well over 500
calls deep on a string less than a 128 characters long.
(The string was not hand-crafted; it was simply a prefix of the
original string that caused the first crash I tracked down.)
The string in question had a mere five tokens, which is to say
that it had just four commas.
It had no open or close parenthesis, so the entire negative
lookahead assertion served as a giant no-op, at least when it
wasn’t crashing.
(Technically, the stack usage is linear in the length of the input
AFTER the first comma, but the first comma was pretty early.)
Sorry if this email is poorly organized; I’ve already spent way
too many hours on it (not even counting the debugging that
prompted it) and I need to get some sleep now.
If anyone actually reads all or most of this, thank you.
Mark.
P.S. if anyone actually responds, thank you even more.
This message is for information purposes only. It is not a
recommendation, advice, offer or solicitation to buy or sell a
product or service, nor an official confirmation of any
transaction. It is directed at persons who are professionals and
is intended for the recipient(s) only. It is not directed at
retail customers. This message is subject to the terms at:
https://www.ib.barclays/disclosures/web-and-email-disclaimer.html
<https://clicktime.symantec.com/15t69jbyHVwykk1KPBVA2?h=WSydRJ-8a9jEVWGSLyCrdLEU7Xfx-K-gu16DAstEYWQ=&u=https://www.ib.barclays/disclosures/web-and-email-disclaimer.html>.
For important disclosures, please see:
https://www.ib.barclays/disclosures/sales-and-trading-disclaimer.html
<https://clicktime.symantec.com/15t64uQgptGPLoBPqd61Q?h=sYkJo73WS5C5wTtskVoUQEfn7gI-sb4yDI0khVoYK3Q=&u=https://www.ib.barclays/disclosures/sales-and-trading-disclaimer.html>
regarding marketing commentary from Barclays Sales and/or Trading
desks, who are active market participants;
https://www.ib.barclays/disclosures/barclays-global-markets-disclosures.html
<https://clicktime.symantec.com/15t5pQpqT3Dc6xhdCwtZY?h=qVJHSoTdp0pI-_4TT9h4U49uHhqWqUQdMGYEdhG-Ouo=&u=https://www.ib.barclays/disclosures/barclays-global-markets-disclosures.html>
regarding our standard terms for Barclays Investment Bank where we
trade with you in principal-to-principal wholesale markets
transactions; and in respect to Barclays Research, including
disclosures relating to specific issuers, see:
http://publicresearch.barclays.com
<https://clicktime.symantec.com/15t5ZvEz5CAps8DraGh7g?h=87QBG12g6Fm-478KIe1pp-nBD10MhX6JgAq8TwQi770=&u=http://publicresearch.barclays.com>.
__________________________________________________________________________________
If you are incorporated or operating in Australia, read these
important disclosures:
https://www.ib.barclays/disclosures/important-disclosures-asia-pacific.html
<https://clicktime.symantec.com/15t5uF27ueuCWuXYkWHiA?h=GPXVMoOv512jLvxDhIJnWdewKcbTe5uGye3MLfO8Uxc=&u=https://www.ib.barclays/disclosures/important-disclosures-asia-pacific.html>.
__________________________________________________________________________________
For more details about how we use personal information, see our
privacy notice:
https://www.ib.barclays/disclosures/personal-information-use.html
<https://clicktime.symantec.com/15t5z5DQNGanvrMUJ4grn?h=zIClmLbPkrAGRja2m5HovRZhDKBBGmDTQHvE9kjAkxQ=&u=https://www.ib.barclays/disclosures/personal-information-use.html>.
__________________________________________________________________________________
This message is for information purposes only. It is not a
recommendation, advice, offer or solicitation to buy or sell a product
or service, nor an official confirmation of any transaction. It is
directed at persons who are professionals and is intended for the
recipient(s) only. It is not directed at retail customers. This
message is subject to the terms at:
https://www.ib.barclays/disclosures/web-and-email-disclaimer.html.
For important disclosures, please see:
https://www.ib.barclays/disclosures/sales-and-trading-disclaimer.html
regarding marketing commentary from Barclays Sales and/or Trading
desks, who are active market participants;
https://www.ib.barclays/disclosures/barclays-global-markets-disclosures.html
regarding our standard terms for Barclays Investment Bank where we
trade with you in principal-to-principal wholesale markets
transactions; and in respect to Barclays Research, including
disclosures relating to specific issuers, see:
http://publicresearch.barclays.com.
__________________________________________________________________________________
If you are incorporated or operating in Australia, read these
important disclosures:
https://www.ib.barclays/disclosures/important-disclosures-asia-pacific.html.
__________________________________________________________________________________
For more details about how we use personal information, see our
privacy notice:
https://www.ib.barclays/disclosures/personal-information-use.html.
__________________________________________________________________________________