[ 
https://issues.apache.org/jira/browse/LUCENE-9963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17347615#comment-17347615
 ] 

Geoffrey Lawson commented on LUCENE-9963:
-----------------------------------------

I see three issues that need resolution.

1) When there is a hole at the beginning of an alternate path the long path 
doesn't have a node setup to end on after flattening. There already has to be 
some hole recovery during the alternate path so we should be able address the 
recovered output node correctly so the long path can find it when it flattens.

2)The last node in an alternate path is what triggers the long path to give up 
it's pointer to the input from the output. If it's not there, tokens that start 
from the long path's output node in the input will try to start at it's output 
node in the output. This can result in out of order tokens and errors. When the 
token after both paths gets added I think it should start at the frontier. If 
it doesn't it should release the edge that brought it to the current node. This 
one seems the trickiest and to fix.

3)Similar to issue 2, but instead of another token coming in to trigger the 
hole resolution, the token stream ends. The output graph is mostly correct, but 
while releasing tokens the filter will expect tokens that don't exist and 
error. We can identify these as hole and not output any tokens.

I've got a change that addresses these problems. I'm not thrilled on the fix 
for issue 2 and I want to add more unit tests to verify it's working as 
intended. I'll post a separate PR for the fix so we can get these tests in 
first.

> Flatten graph filter has errors when there are holes at beginning or end of 
> alternate paths
> -------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-9963
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9963
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: modules/analysis
>    Affects Versions: 8.8
>            Reporter: Geoffrey Lawson
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> If asserts are enabled having gaps at the beginning or end of an alternate 
> path can result in assertion errors
> ex: 
>  
> {code:java}
> java.lang.AssertionError: 2
> at  
> org.apache.lucene.analysis.core.FlattenGraphFilter.releaseBufferedToken(FlattenGraphFilter.java:195)
> {code}
>  
> Or
>  
> {code:java}
> java.lang.AssertionError
> at 
> org.apache.lucene.analysis.core.FlattenGraphFilter.releaseBufferedToken(FlattenGraphFilter.java:191)
> {code}
>  
>  
> If asserts are not enabled these the same conditions will result in either 
> IndexOutOfBounds Exceptions, or dropped tokens.
>  
> {code:java}
> java.lang.ArrayIndexOutOfBoundsException: Index -2 out of bounds for length 8
> at org.apache.lucene.util.RollingBuffer.get(RollingBuffer.java:109)
> at 
> org.apache.lucene.analysis.core.FlattenGraphFilter.incrementToken(FlattenGraphFilter.java:325)
> {code}
>  
> These issues can be recreated with the following unit tests
> {code:java}
> public void testAltPathFirstStepHole() throws IOException {
>  TokenStream in = new CannedTokenStream(0, 3, new Token[]{
>  token("abc",1, 3, 0, 3),
>  token("b",1, 1, 1, 2),
>  token("c",1, 1, 2, 3)
>  });
>  TokenStream out = new FlattenGraphFilter(in);
>  assertTokenStreamContents(out,
>  new String[]{"abc", "b", "c"},
>  new int[] {0, 1, 2},
>  new int[] {3, 2, 3}, 
>  new int[] {1, 1, 1},
>  new int[] {3, 1, 1}, //token 0 may need to be len 1 after flattening
>  3);
> }{code}
> {code:java}
> public void testAltPathLastStepHole() throws IOException {
>  TokenStream in = new CannedTokenStream(0, 4, new Token[]{
>  token("abc",1, 3, 0, 3),
>  token("a",0, 1, 0, 1),
>  token("b",1, 1, 1, 2),
>  token("d",2, 1, 3, 4)
>  });
>  TokenStream out = new FlattenGraphFilter(in);
>  assertTokenStreamContents(out,
>  new String[]{"abc", "a", "b", "d"},
>  new int[] {0, 0, 1, 3},
>  new int[] {1, 1, 2, 4},
>  new int[] {1, 0, 1, 2},
>  new int[] {3, 1, 1, 1},
>  4);
> }{code}
> {code:java}
> public void testAltPathLastStepHoleWithoutEndToken() throws IOException {
>  TokenStream in = new CannedTokenStream(0, 2, new Token[]{
>  token("abc",1, 3, 0, 3),
>  token("a",0, 1, 0, 1),
>  token("b",1, 1, 1, 2)
>  });
>  TokenStream out = new FlattenGraphFilter(in);
>  assertTokenStreamContents(out,
>  new String[]{"abc", "a", "b"},
>  new int[] {0, 0, 1},
>  new int[] {1, 1, 2},
>  new int[] {1, 0, 1},
>  new int[] {1, 1, 1},
>  2);
> }{code}
> I believe Lucene-8723 is a related issue as it looks like the last token in 
> an alternate path is being deleted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to