[ https://issues.apache.org/jira/browse/LUCENE-4312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17179959#comment-17179959 ]
Michael McCandless commented on LUCENE-4312: -------------------------------------------- [~mgibney] I think we should try to find a way forward here? I think what [~rcmuir] briefly suggested above would be a good approach to break the chicken/egg? I do not think we can work out up front what the "bar" would be to promote this approach to Lucene's core, but that should not stop us from getting an initial version working in {{sandbox}}. Store the position in length as a payload (simple {{TokenFilter}} can do that), then create custom span queries that load that payload, decode it back to position length, and 100% correctly match positional queries that contain multi-token index-time synonyms. That correctness achievement alone will be incredible and help many users suffering with this longstanding issue. I don't think you would need any changes to {{DefaultIndexingChain}}, {{PostingsEnum}}, etc. for this implementation? We encourage usage of that approach, we run benchmarks, we iterate to improve performance etc. and that may eventually give us the currency to make API changes in Lucene's core to more directly support position length in the index. Or, maybe the payload implementation is perfectly fine forever. I think we should open a new issue for this effort (not reuse this one, or LUCENE-7398, or LUCENE-8776). Yes, this might seem like Jira cancer metastasis, but I think the specifics of this implementation plan warrant a dedicated issue. The issues purpose is to get your working payload solution available in Lucene's sandbox. > Index format to store position length per position > -------------------------------------------------- > > Key: LUCENE-4312 > URL: https://issues.apache.org/jira/browse/LUCENE-4312 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs > Affects Versions: 6.0 > Reporter: Gang Luo > Priority: Minor > Labels: Suggestion > Attachments: positionLength-postings.patch > > Original Estimate: 72h > Remaining Estimate: 72h > > Mike Mccandless said:TokenStreams are actually graphs. > Indexer ignores PositionLengthAttribute.Need change the index format (and > Codec APIs) to store an additional int position length per position. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org