Pigsy-Monk opened a new pull request, #3115:
URL: https://github.com/apache/fory/pull/3115
# What does this PR do?
This PR adds variable-length encoding serializers for `long[]` arrays in
Java, which provides more space-efficient serialization for arrays containing
many small values.
## Changes:
- **Enhance `LongArraySerializer` with variable-length encoding support**:
Added `supportVarLenEncoding` parameter to `LongArraySerializer` constructor,
allowing it to optionally use variable-length encoding when enabled.
- **Add comprehensive test cases**:
- `testVariableLengthLongArray()`: Tests serialization/deserialization of
long arrays with various value ranges (empty, small, mixed, large, negative
values)
- `testVariableLengthEncodingEfficiencyForSmallValues()`: Demonstrates
that variable-length encoding produces significantly smaller serialized data
(50%+ reduction) for arrays containing many small values
## Test Details:
- **testVariableLengthLongArray**:
- Tests empty arrays
- Tests arrays with small values (0-255)
- Tests arrays with mixed small and large values (including
`Long.MAX_VALUE` and `Long.MIN_VALUE`)
- Tests arrays with negative values
- Tests large arrays (1000 elements) with many small values
- **testVariableLengthEncodingEfficiencyForSmallValues**:
- Compares serialization size between fixed-length encoding (8 bytes per
long element) and variable-length encoding (1-2 bytes per small element)
- Tests with arrays containing values 0-127 (optimal for variable-length
encoding)
- Tests with arrays containing values 0-1023 (still benefits from
variable-length encoding)
- Verifies at least 50% size reduction for small values
- Outputs detailed efficiency metrics (bytes, bytes per element,
percentage reduction)
## Performance Benefits:
For arrays containing many small values:
- **Fixed-length encoding**: 8 bytes per `long` element + overhead
- **Variable-length encoding**: 1-2 bytes per small `long` element + overhead
- **Space savings**: Up to 75%+ reduction for arrays with values in the
0-127 range
## Related Issues:
- Addresses the need for more efficient serialization of primitive arrays
with small values
- Enables space-optimized serialization for use cases like sparse arrays,
indices, counters, etc.
## Does this PR introduce any user-facing change?
**No**, this PR only adds new serializer classes and test cases. The default
serializers remain unchanged. Users can opt-in to variable-length encoding by
using the enhanced `LongArraySerializer` with `supportVarLenEncoding=true`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]