gf2121 commented on issue #12895: URL: https://github.com/apache/lucene/issues/12895#issuecomment-1848878401
> I don't think this assumption is valid @gf2121? Because that floor data first contains the file pointer of the on-disk block that this prefix points to (in MSB order as of 9.9, where lots of prefix sharing should happen), so, internal arcs before the final arc are in fact expected to output shared prefix bytes? I thought the 'assumption' here means that we assert the floor data are all stored in the last arc. The whole FST output encoded as `[ MSBVLong | floordata ]`. We may share prefixes in MSBVLong, but we can not have two output having same `MSBVLong` so `floordata` will never be splitted into more than one arcs. Did i misunderstand something? As @benwtrent pointed out, we should accumulate from the `outputPrefix` instead of `arc.output`. I raised https://github.com/apache/lucene/pull/12900 for this. This patch seems to fix the exception when searching `WildcardQuery(new Term("body", "*fo*"))` on `Wikibig1m`. I'll try`Wikibigall` as well. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org