gf2121 commented on issue #12895:
URL: https://github.com/apache/lucene/issues/12895#issuecomment-1848878401

   > I don't think this assumption is valid @gf2121? Because that floor data 
first contains the file pointer of the on-disk block that this prefix points to 
(in MSB order as of 9.9, where lots of prefix sharing should happen), so, 
internal arcs before the final arc are in fact expected to output shared prefix 
bytes?
   
   I thought the 'assumption' here means that we assert the floor data are all 
stored in the last arc. The whole FST output encoded as `[ MSBVLong | floordata 
]`. We may share prefixes in MSBVLong, but we can not have two output having 
same `MSBVLong` so `floordata` will never be splitted into more than one arcs. 
Did i misunderstand something?
   
   As @benwtrent pointed out, we should accumulate from the `outputPrefix` 
instead of `arc.output`. I raised https://github.com/apache/lucene/pull/12900 
for this. This patch seems to fix the exception when searching 
`WildcardQuery(new Term("body", "*fo*"))` on `Wikibig1m`. I'll try`Wikibigall` 
as well.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to