Hi Ken,
Thanks for bringing this up, I believe topic warrants some further discussion.
My understanding of the intent of the current system is that it aims to provide
a consistent and predictable set of rules for comparisons between any
datatypes. Prior to 3.6, in general comparisons between different types in
gremlin produced undefined behaviour (in practice this usually meant an
exception). The current system successfully resolved much of this issue
although it has introduced certain semantic consistency issues (see
https://issues.apache.org/jira/browse/TINKERPOP-2940). Further, while the docs
(https://tinkerpop.apache.org/docs/3.7.0/dev/provider/#_ternary_boolean_logics)
are quite clear regarding the propagation/reduction behaviour in many cases, as
you probe the edges it becomes muddier.
Considering the following example, the docs quite clearly define the expected
behaviour of the first traversal, but the expected behaviour is not clear
outside of basic combinations of AND, OR, and NOT:
gremlin> g.inject(1).not(is(gt("one")))
// Produces no output
gremlin> g.inject(1).not(union(is(gt("one")), is(eq("zero"))))
==>1 // Error is reduced to false prior to Union Step, and thus not propagated
into the Not Step.
This is a good example that we are currently in a bit of a weird place where
some of the language semantics are formally defined in documentation, while the
rest of the language semantics are defined by implementation. It currently
cannot be determined if the above example is expected or a bug. I believe it is
important that we find a resolution to this by expanding our formally defined
semantics or changing the implementation (when a breaking change is
permittable).
As for the short-term question posed by ANY and ALL, my only concern with your
suggestion is it would be subject to the following inconsistency although as
shown above there is current precedent for this sort of thing.
gremlin> g.inject(1).not(is(lt("one")))
// Produces no output
gremlin> g.inject([1]).not(any(is(lt("one"))))
==>[1]
In my opinion the most neutral direction would be for ANY to behave the same as
a chain of OR’s and for ALL to act as a chain of ANDs. However, it makes sense
for this short-term decision to align with our long-term direction regarding
comparability semantics. I wouldn’t be opposed to your proposed implementation
if the long-term plan is to move all steps towards this immediate reduction
behaviour.
Thanks,
Cole Greer
From: Ken Hu <[email protected]>
Date: Monday, September 11, 2023 at 4:16 PM
To: [email protected] <[email protected]>
Subject: [DISCUSS] Ternary Boolean Handling in New Steps
Hi All,
Starting in version 3.6, the ternary boolean system was introduced to
handle comparison/equality tests within Gremlin. Recently, I've been
implementing some list functions from Proposal 3 which make heavy use of
the GremlinValueComparator to determine if values satisfy a specific
condition. However, I'm finding it a bit tricky to understand how I should
handle the GremlinTypeErrorException. For any() and all(), it seems like it
would make sense to immediately reduce any ERROR state to false as it's a
filter step. In the case of all(), if a GremlinTypeErrorException is
caught, it would mean there was a comparison error so the traverser should
be removed from the stream. However, doing this seemingly clashes with the
original intention of ternary boolean which is to allow a provider-specific
response on how to handle an ERROR state.
My current thoughts are that we should rework the ternary boolean system in
the future to make it easier to incorporate it into new steps. One of the
trickiest parts is that it uses unchecked exceptions as a means to
implement the ERROR state which can get easily missed or accidentally
leaked to the user (which has happened before). For now, I'm planning to go
ahead and immediately reduce ERROR states as I think that is what makes the
most sense for list functions.
Does anyone have any thoughts about this?
Thanks,
Ken