[ 
https://issues.apache.org/jira/browse/LUCENE-9963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17396253#comment-17396253
 ] 

ASF subversion and git services commented on LUCENE-9963:
---------------------------------------------------------

Commit 647255b4d29bb56ddcfdf44bdb6e7d5d0ca76a14 in lucene's branch 
refs/heads/main from Geoffrey Lawson
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=647255b ]

LUCENE-9963 Improve FlattenGraphFilter's robustness when handling incoming 
token graphs with holes (#157)

6 main improvements:
    1) Iterate through all output.InputNodes since dest gaps can exist.
    2) freeBefore the minimum input node instead of the first input node(which 
was usually, but not always, the minimum).
    3) Don't freeBefore from a hole source node. Book keeping may not be 
correct and could result in an early free.
    4) When adding an output node after hole recovery, calculate its new 
position increment instead of adding it to the end of the output graph.
    5) Nodes after holes that have edges to their source will do the output 
re-mapping that the deleted node would have done.
    6) If a disconnected input node swaps order with another node in the 
output, then map them to the same output node.

Co-authored-by: Lawson <geof...@amazon.com>

> Flatten graph filter has errors when there are holes at beginning or end of 
> alternate paths
> -------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-9963
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9963
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: modules/analysis
>    Affects Versions: 8.8
>            Reporter: Geoffrey Lawson
>            Priority: Major
>          Time Spent: 7h 10m
>  Remaining Estimate: 0h
>
> If asserts are enabled having gaps at the beginning or end of an alternate 
> path can result in assertion errors
> ex: 
>  
> {code:java}
> java.lang.AssertionError: 2
> at  
> org.apache.lucene.analysis.core.FlattenGraphFilter.releaseBufferedToken(FlattenGraphFilter.java:195)
> {code}
>  
> Or
>  
> {code:java}
> java.lang.AssertionError
> at 
> org.apache.lucene.analysis.core.FlattenGraphFilter.releaseBufferedToken(FlattenGraphFilter.java:191)
> {code}
>  
>  
> If asserts are not enabled these the same conditions will result in either 
> IndexOutOfBounds Exceptions, or dropped tokens.
>  
> {code:java}
> java.lang.ArrayIndexOutOfBoundsException: Index -2 out of bounds for length 8
> at org.apache.lucene.util.RollingBuffer.get(RollingBuffer.java:109)
> at 
> org.apache.lucene.analysis.core.FlattenGraphFilter.incrementToken(FlattenGraphFilter.java:325)
> {code}
>  
> These issues can be recreated with the following unit tests
> {code:java}
> public void testAltPathFirstStepHole() throws IOException {
>  TokenStream in = new CannedTokenStream(0, 3, new Token[]{
>  token("abc",1, 3, 0, 3),
>  token("b",1, 1, 1, 2),
>  token("c",1, 1, 2, 3)
>  });
>  TokenStream out = new FlattenGraphFilter(in);
>  assertTokenStreamContents(out,
>  new String[]{"abc", "b", "c"},
>  new int[] {0, 1, 2},
>  new int[] {3, 2, 3}, 
>  new int[] {1, 1, 1},
>  new int[] {3, 1, 1}, //token 0 may need to be len 1 after flattening
>  3);
> }{code}
> {code:java}
> public void testAltPathLastStepHole() throws IOException {
>  TokenStream in = new CannedTokenStream(0, 4, new Token[]{
>  token("abc",1, 3, 0, 3),
>  token("a",0, 1, 0, 1),
>  token("b",1, 1, 1, 2),
>  token("d",2, 1, 3, 4)
>  });
>  TokenStream out = new FlattenGraphFilter(in);
>  assertTokenStreamContents(out,
>  new String[]{"abc", "a", "b", "d"},
>  new int[] {0, 0, 1, 3},
>  new int[] {1, 1, 2, 4},
>  new int[] {1, 0, 1, 2},
>  new int[] {3, 1, 1, 1},
>  4);
> }{code}
> {code:java}
> public void testAltPathLastStepHoleWithoutEndToken() throws IOException {
>  TokenStream in = new CannedTokenStream(0, 2, new Token[]{
>  token("abc",1, 3, 0, 3),
>  token("a",0, 1, 0, 1),
>  token("b",1, 1, 1, 2)
>  });
>  TokenStream out = new FlattenGraphFilter(in);
>  assertTokenStreamContents(out,
>  new String[]{"abc", "a", "b"},
>  new int[] {0, 0, 1},
>  new int[] {1, 1, 2},
>  new int[] {1, 0, 1},
>  new int[] {1, 1, 1},
>  2);
> }{code}
> I believe Lucene-8723 is a related issue as it looks like the last token in 
> an alternate path is being deleted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to