[ 
https://issues.apache.org/jira/browse/LUCENE-9204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17352537#comment-17352537
 ] 

Michael Gibney commented on LUCENE-9204:
----------------------------------------

[~romseygeek], [~mikemccand]: on the topic of comparing spans and intervals ...

{quote}[intervals] have more predictable behaviour (and we know that they 
behave 'correctly', up to a mathematical definition of correct)
{quote}

Although I gather that this statement is technically correct, I think it's 
easily misinterpreted, and could be misleading.

To the extent that intervals behave "correctly" and spans don't, I'm fairly 
certain the differences are largely due to defining the problem away: 
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-intervals-query.html#interval-minimization

"surprising results" -- I appreciate that interval query behavior is 
consistent; but "surprising results" are "surprising results", from a user's 
perspective -- and consistency is likely to be cold comfort.

The {{"the big bad wolf"}} example (in ES docs above) is particularly 
illustrative, because it essentially acknowledges that intervals suffer the 
same problem as spans under LUCENE-7398; but the solution for intervals is 
"don't do that" -- which of course would "work" equally well for spans.

I don't consider myself a "span query partisan"; but I'm surprised that the 
move towards intervals is often vaguely framed in terms of "span bugs" that are 
in fact "positional query bugs". IIUC, there are other reasons to prefer 
interval queries -- although I'm not the best person to speak to those other 
reasons.

If there _are_ differences in functional behavior (wrt core positional query 
functionality) are there tests that illustrate those differences? In 
particular, are there cases where span queries are not "broken", and that 
behave differently than analogous interval queries? My sense is that a 
concerted effort to port the edge case tests around LUCENE-7398 from spans to 
intervals would turn up many of the same intuitive problems in intervals; but 
I'm not sure that would even be possible, because the same/similar "surprising" 
behavior in intervals is by definition considered to be "mathematically 
correct".


> Move span queries to the queries module
> ---------------------------------------
>
>                 Key: LUCENE-9204
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9204
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Alan Woodward
>            Assignee: Alan Woodward
>            Priority: Major
>             Fix For: main (9.0)
>
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> We have a slightly odd situation currently, with two parallel query 
> structures for building complex positional queries: the long-standing span 
> queries, in core; and interval queries, in the queries module.  Given that 
> interval queries solve at least some of the problems we've had with Spans, I 
> think we should be pushing users more towards these implementations.  It's 
> counter-intuitive to do that when Spans are in core though.  I've opened this 
> issue to discuss moving the spans package as a whole to the queries module.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to