So, I wanted to clarify a few things. Then I think I agree we want
runtime-issued SDW to not be lost when backtracking.
An SDE, or schema definition error, is most commonly the Daffodil schema
compiler telling you your schema isn't meaningful, so parsing/unparsing cannot
even be started. We divide up Daffodil into "compiling the schema" or "compile
time" and runtime (parse/unparse time).
Some SDEs cannot be detected until runtime, but SDEs are always fatal. I.e,
there is never any backtracking from them, because they mean there is something
wrong with your DFDL schema.
Processing errors (parse error or unparse error) are errors where your schema
is meaningful but the data doesn't match the schema. Some parse errors are a
normal part of parsing as they are suppressed by backtracking to try other
alternatives.
Schema-definition Warnings (SDW) are not parse errors but the warning version
of a SDE. I.e., they suggest a possible error in the schema. SDWs detected at
compile time are always output by the compiler. If an SDW is issued at runtime,
there is an interesting question of should those be suppressed by backtracking?
I don't know of runtime SDWs off hand, I searched the source for them, but
found only one possibility where the SDW could be issued at runtime. Which is
this code in DState.scala:
private def isAnArray(): Boolean = {
if (!currentNode.isInstanceOf[DIArray]) {
Assert.invariant(errorOrWarn.isDefined)
if (currentNode.isInstanceOf[DIElement]) {
errorOrWarn.get.SDW(WarnID.PathNotToArray, "The specified path to element
%s is not to an array. Suggest using fn:exists instead.", currentElement.name)
} else {
errorOrWarn.get.SDW(WarnID.PathNotToArray, "The specified path is not to
an array. Suggest using fn:exists instead.")
}
false
} else {
true
}
}
This does get called at runtime. I just would expect path expressions to be
compiled and this to have been checked already at compilation time, which
should render this runtime check unnecessary, I think. I did not find a test
that produces this warning message.
I think a SDW that is warning about a implementation limit like regex match
length limit, being reached, should not be suppressed by backtracking. As you
pointed out, such a warning could be telling you about the reason for the
backtracking, and suppressing the warning means you would not be able to
diagnose why the backtracking is occurring.
Calling these implementation limit hits "schema definition" warnings is ok with
me, because the schema goes along with the tunables like the max regex size
limit. Both are static things that the data must comply with for parse/unparse
to be successful.
I imagine that if you just add an SDW call at runtime, it will put the warning
onto the diagnostics in the PState, and they will be discarded on backtracking,
but probably that should not happen for runtime SDWs, only for parse errors.
-mikeb
________________________________
From: Larry Barber <[email protected]>
Sent: Friday, December 18, 2020 2:59 PM
To: [email protected] <[email protected]>
Subject: RE: How to add warnings that are not lost due to backtracking
I actually ran into this problem with parsing a large jpeg file. I thought that
I had uncovered a bug because the file was not being parsed correctly. Once it
was pointed out to me, the problem was solved by changing the tunable to
increase the REGex search length, the file parsed as expected. The REGex search
failure caused (erroneous) backtracking, so I need to see the information about
the search failing.
This is part of Daffodil-412, which required a 2 part solution. The tunable was
implemented for the first part, but the second part - the warning message was
not.
If SDW is not the way to go, I'd be happy to work with another suggestion.
From: Carlson, Ian [mailto:[email protected]]
Sent: Friday, December 18, 2020 2:37 PM
To: [email protected]
Subject: RE: How to add warnings that are not lost due to backtracking
I'm still new at this - but I've found a great way to learn is to invite people
to tell me I'm wrong, so here's my two cents.
SDE in particular is generally used to tell the parser that something has gone
wrong. This invites the parser to either back up to the most recent point of
uncertainty and try another path or fail completely if none exists. That's how
we select one branch over another in the cases where there are multiple
possible paths.
If we do select a path that turns out to be invalid, we generally don't want
those errors to propagate back up the chain, since they are for a "path not
taken" and failing in a way that leads us to the correct path is both expected
and desired behavior. By extension, warnings encountered on our "path not
taken" also get discarded since. For instance, if we have a regex failure
looking for the length of a discriminator that ultimately doesn't exist because
this is an invalid path - that isn't really a failure at the top level.
So using SDW for a global "something weird you might want to examine" sort of
warning is somewhat at odds with the way SDE and SDW are usually used.
Our runtime does generate quite a bit of text - so simply printing to console
for a warning is likely to be missed. If we want to have a sort of global log
that doesn't get cleared, but also isn't mingled with the runtime console
output - we may need a new facility for that.
Side note - there are certain classes of diagnostics around choice branches
that don't get discarded currently, which may cause some warnings and errors to
escape even though we output a successful infoset. Ticket 2399 discusses this
issue, and a partial attempt at a fix is languishing WIP on
https://github.com/apache/incubator-daffodil/pull/444<https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-daffodil%2Fpull%2F444&data=04%7C01%7Clarry.barber%40nteligen.com%7Cc8f0ac12f1c3496ebe5a08d8a38c4668%7C379c214c5c944e86a6062d047675f02a%7C0%7C1%7C637439170732507978%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=kbcZFqwx5eFlNoL0YbogmKvNZ35oKZW6AxzLVadJaKc%3D&reserved=0>.
The short version being that I wouldn't want to rely on any information from
SDW or SDE escaping a "path not taken" once that fix is in place.
[A picture containing object, clock Description automatically generated]
Ian Carlson | Software Engineer
[Owl Cyber Defense]
W
[email protected]<https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Fowlcyberdefense.com%2F&data=04%7C01%7Clarry.barber%40nteligen.com%7Cc8f0ac12f1c3496ebe5a08d8a38c4668%7C379c214c5c944e86a6062d047675f02a%7C0%7C1%7C637439170732507978%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=bs1FXboduaYSt80y5vDzoqomiA06rrsU95a%2BbXal9bQ%3D&reserved=0>
Connect with us!
Facebook<https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Fwww.facebook.com%2Fowlcyberdefense%2F&data=04%7C01%7Clarry.barber%40nteligen.com%7Cc8f0ac12f1c3496ebe5a08d8a38c4668%7C379c214c5c944e86a6062d047675f02a%7C0%7C1%7C637439170732507978%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=%2F%2BBHSw8LkVl1Or4M0QuecYfyVdiLJPr9Jp2jnp51Eus%3D&reserved=0>
|
LinkedIn<https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Fwww.linkedin.com%2Fcompany%2Fowlcyberdefense%2F&data=04%7C01%7Clarry.barber%40nteligen.com%7Cc8f0ac12f1c3496ebe5a08d8a38c4668%7C379c214c5c944e86a6062d047675f02a%7C0%7C1%7C637439170732507978%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=pmypykaqtCXhz2ouRUHU67vSADVmF2seFcpJJlhfSsg%3D&reserved=0>
|
Twitter<https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Ftwitter.com%2Fowlcyberdefense&data=04%7C01%7Clarry.barber%40nteligen.com%7Cc8f0ac12f1c3496ebe5a08d8a38c4668%7C379c214c5c944e86a6062d047675f02a%7C0%7C1%7C637439170732507978%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=ec%2FeGxSnfJ1bS73sr3x5U7v%2FOyTT40xxY4SclD%2FY8cE%3D&reserved=0>
[Find us at our next event. Click
Here.]<https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Fowlcyberdefense.com%2Fresources%2Fevents%2F%3Futm_source%3Dowl-cyber-defense%26utm_medium%3Demail%26utm_content%3Dbanner%26utm_campaign%3D2020-events&data=04%7C01%7Clarry.barber%40nteligen.com%7Cc8f0ac12f1c3496ebe5a08d8a38c4668%7C379c214c5c944e86a6062d047675f02a%7C0%7C1%7C637439170732507978%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=VEqfoXLudcBsZ9Yl770496AlbhfmvZlc1wx%2BeP7XO%2Fw%3D&reserved=0>
The information contained in this transmission is for the personal and
confidential use of the individual or entity to which it is addressed.
If the reader is not the intended recipient, you are hereby notified that any
review, dissemination, or copying of this communication is strictly prohibited.
If you have received this transmission in error, please notify the sender
immediately
From: Larry Barber<mailto:[email protected]>
Sent: Friday, December 18, 2020 12:49 PM
To: [email protected]<mailto:[email protected]>
Subject: How to add warnings that are not lost due to backtracking
I hoping someone could give me some pointers on adding a warning message to the
Daffodil io code.
I'm looking at Daffodil-412 and want to generate a warning message when the
REGex search gets expanded and another if it exceeds the tunable for maximum
length.
I've located the code that does these expansions in
io/InputSourceDaraInputStream.scala, but I'm unsure how to generate the warning
messages.
I don't see any other warning messages being generated in the io code. I've
seen several instances in core that just use SDW(...) and others in DSOM that
use context.SDW(...), but I'm confused about this - I'm afraid that this method
buffers warnings and throws them away in the case of backtracking. Since the
REGex search may be the cause of backtracking, I think these warnings need to
be presented always.
I'm just not sure of the proper way to access SDW in this situation and need to
make sure that the messages will not be discarded.