zhaih commented on a change in pull request #157: URL: https://github.com/apache/lucene/pull/157#discussion_r644165757
########## File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/core/FlattenGraphFilter.java ########## @@ -362,6 +394,48 @@ public boolean incrementToken() throws IOException { } } + private OutputNode recoverFromHole(InputNode src, int startOffset) { + // This means the "from" node of this token was never seen as a "to" node, + // which should only happen if we just crossed a hole. This is a challenging + // case for us because we normally rely on the full dependencies expressed + // by the arcs to assign outgoing node IDs. It would be better if tokens + // were never dropped but instead just marked deleted with a new + // TermDeletedAttribute (boolean valued) ... but until that future, we have + // a hack here to forcefully jump the output node ID: + assert src.outputNode == -1; + src.node = inputFrom; + + int maxOutIndex = outputNodes.getMaxPos(); + OutputNode outSrc = outputNodes.get(maxOutIndex); + // There are two types of holes, neighbor holes and consumed holes. A neighbor hole is between + // two tokens, it looks like a->*hole*->b. + // A consumed hole is between the start a long token and the next token that is "under" the path + // of the long token. + // It looks like : ___abc__ + // | | + // | V + // *hole*->b->c + // A consumed hole should have the outputsrc node of the short token after the hole be the out + // dest + // of the long token as that's how we'd resolve it if the missing token were present. + // neighbor holes should start a new output node and continue as if the hole didn't + // exist. + // Related tests testAltPathLastStepHoleFollowedByHole, testAltPathFirstStepHole, Review comment: Thank you for linking the tests here! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org