[ 
https://issues.apache.org/jira/browse/LUCENE-4312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17179959#comment-17179959
 ] 

Michael McCandless commented on LUCENE-4312:
--------------------------------------------

[~mgibney] I think we should try to find a way forward here?

I think what [~rcmuir] briefly suggested above would be a good approach to 
break the chicken/egg?  I do not think we can work out up front what the "bar" 
would be to promote this approach to Lucene's core, but that should not stop us 
from getting an initial version working in {{sandbox}}.

Store the position in length as a payload (simple {{TokenFilter}} can do that), 
then create custom span queries that load that payload, decode it back to 
position length, and 100% correctly match positional queries that contain 
multi-token index-time synonyms.  That correctness achievement alone will be 
incredible and help many users suffering with this longstanding issue.  I don't 
think you would need any changes to {{DefaultIndexingChain}}, {{PostingsEnum}}, 
etc. for this implementation?

We encourage usage of that approach, we run benchmarks, we iterate to improve 
performance etc. and that may eventually give us the currency to make API 
changes in Lucene's core to more directly support position length in the index. 
 Or, maybe the payload implementation is perfectly fine forever.

I think we should open a new issue for this effort (not reuse this one, or 
LUCENE-7398, or LUCENE-8776).  Yes, this might seem like Jira cancer 
metastasis, but I think the specifics of this implementation plan warrant a 
dedicated issue.  The issues purpose is to get your working payload solution 
available in Lucene's sandbox.

> Index format to store position length per position
> --------------------------------------------------
>
>                 Key: LUCENE-4312
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4312
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/codecs
>    Affects Versions: 6.0
>            Reporter: Gang Luo
>            Priority: Minor
>              Labels: Suggestion
>         Attachments: positionLength-postings.patch
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Mike Mccandless said:TokenStreams are actually graphs.
> Indexer ignores PositionLengthAttribute.Need change the index format (and 
> Codec APIs) to store an additional int position length per position.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to