mikemccand commented on issue #12895:
URL: https://github.com/apache/lucene/issues/12895#issuecomment-1848934726

   > > I don't think this assumption is valid @gf2121? Because that floor data 
first contains the file pointer of the on-disk block that this prefix points to 
(in MSB order as of 9.9, where lots of prefix sharing should happen), so, 
internal arcs before the final arc are in fact expected to output shared prefix 
bytes?
   > 
   > I thought the 'assumption' here means that we assert the floor data are 
all stored in the last arc. The whole FST output encoded as `[ MSBVLong | 
floordata ]`. We may share prefixes in MSBVLong, but we can not have two output 
having same `MSBVLong` so `floordata` will never be splitted into more than one 
arcs. Did i misunderstand something?
   
   Sorry @gf2121 -- that is indeed correct: except for the leading 
vLong-encoded "fp + 2 bits", the remainder of floor data will always be on the 
last arc.  But that leading vLong has those important flags that we were losing 
in the LSB encoded case.
   
   > As @benwtrent pointed out, we should accumulate from the `outputPrefix` 
instead of `arc.output`. I raised #12900 for this. This patch seems to fix the 
exception when searching `WildcardQuery(new Term("body", "*fo*"))` on 
`Wikibig1m`. I'll try`Wikibigall` as well.
   
   +1 -- this is the right fix (to not lose any leading bytes for the FST's 
output in `IntersectTermsEnum`).  I'll review the PR and open followon issue to 
somehow expose the bug with stronger BWC test.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to