[I] Specialize BytesRef for literal byte arrays to save 8 bytes [lucene]

via GitHub Tue, 16 Sep 2025 14:19:46 -0700


msfroh opened a new issue, #15191:
URL: https://github.com/apache/lucene/issues/15191


   ### Description
   
   I was reviewing a PR that dealt with some Protobuf today and saw that their 
`ByteString` class (which is pretty similar to Lucene's `BytesRef`) is abstract 
with a pair of concrete subclasses -- `LiteralByteString` and 
`BoundedByteString`. The `BoundedByteString` has length and offset members, 
while `LiteralByteString` is just a wrapper around `byte[]`. As a result, 
`LiteralByteString` is a whole 8 bytes smaller. Woohoo!
   
   This got me thinking -- how many `BytesRef` instances out there have `offset 
== 0` and `length == bytes.length`? 
   
   Within a lot of "hot" Lucene code, I believe the answer is "not many", since 
we do a **very** good job of reusing `BytesRef` instances forever. That said, 
all of the `Term` constructors end up producing `BytesRef`s of known (fixed) 
length. So the potential benefit is clearly non-zero. (Maybe close to zero?)
   
   I'm thinking of trying to make `BytesRef` `abstract` and `sealed` with a 
pair of subclasses, similar to the Protobuf approach. Obviously, this means 
replacing direct field access with getters (and setters), but I think those can 
be bimorphically inlined.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] Specialize BytesRef for literal byte arrays to save 8 bytes [lucene]

Reply via email to