gf2121 commented on PR #12661: URL: https://github.com/apache/lucene/pull/12661#issuecomment-1765808689
Hi @jpountz , Thanks a lot for the suggestion! > another option could be to encode the number of supplementary bytes using unary coding (like UTF8). This is a great idea that probably makes `readMSBVLong` more faster ! FYI, the direction I'm considering is that it's not "decoding the MSB VLong" that causes this regression, but "how the MSB VLong changes the FST structure": * For LSB VLong output, most/all of the bytes are stored in single arc. * For MSB VLong output, bytes are spilitted into many arcs for prefix sharing. So we will need to more `Outputs#read` and `Outputs#add` on for `MSBVLong` to get the whole output. Here is a comparing of call times between LSB VLong (before #12631) and MSB VLong (after #12631) <!--StartFragment--><byte-sheet-html-origin data-id="1697525209097" data-version="4" data-is-embed="false" data-grid-line-hidden="false" data-importRangeRawData-spreadSource="https://bytedance.feishu.cn/sheets/Yp2Zs5ngphNWEHtZHUic8WlZnUf" data-importRangeRawData-range="'Sheet1'!A1:D3"> | LSB VLong | MSB VLong | diff -- | -- | -- | -- Outputs#read times | 116097 | 149803 | 29.03% Outputs#add times | 144 | 111568 | 77377.78% </byte-sheet-html-origin><!--EndFragment--> Unfortunately, `ByteSequenceOutputs#add` and `ByteSequenceOutputs#read` always need to construct new `BytesRef` objects, not efficient enough. This patch tried to speed up `ByteSequenceOutputs#add` a bit , getting the tiny improvement [mentioned above](https://github.com/apache/lucene/pull/12661#issuecomment-1764814636). But we are still seeing the regression there because `add` still needed while origin patch just ignore the NO_OUTPUT arcs. So i'm not very sure the optimization of the decoding output can resolve the regression as it does not look like the bottleneck to me, but I'd like to give a try if you still think it is worth :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org