Pigsy-Monk opened a new pull request, #3115:
URL: https://github.com/apache/fory/pull/3115

   # What does this PR do?
   
   This PR adds variable-length encoding serializers for `long[]` arrays in 
Java, which provides more space-efficient serialization for arrays containing 
many small values.
   
   ## Changes:
   
   - **Enhance `LongArraySerializer` with variable-length encoding support**: 
Added `supportVarLenEncoding` parameter to `LongArraySerializer` constructor, 
allowing it to optionally use variable-length encoding when enabled.
   
   - **Add comprehensive test cases**:
     - `testVariableLengthLongArray()`: Tests serialization/deserialization of 
long arrays with various value ranges (empty, small, mixed, large, negative 
values)
     - `testVariableLengthEncodingEfficiencyForSmallValues()`: Demonstrates 
that variable-length encoding produces significantly smaller serialized data 
(50%+ reduction) for arrays containing many small values
   
   ## Test Details:
   
   - **testVariableLengthLongArray**: 
     - Tests empty arrays
     - Tests arrays with small values (0-255)
     - Tests arrays with mixed small and large values (including 
`Long.MAX_VALUE` and `Long.MIN_VALUE`)
     - Tests arrays with negative values
     - Tests large arrays (1000 elements) with many small values
   
   - **testVariableLengthEncodingEfficiencyForSmallValues**:
     - Compares serialization size between fixed-length encoding (8 bytes per 
long element) and variable-length encoding (1-2 bytes per small element)
     - Tests with arrays containing values 0-127 (optimal for variable-length 
encoding)
     - Tests with arrays containing values 0-1023 (still benefits from 
variable-length encoding)
     - Verifies at least 50% size reduction for small values
     - Outputs detailed efficiency metrics (bytes, bytes per element, 
percentage reduction)
   
   ## Performance Benefits:
   
   For arrays containing many small values:
   - **Fixed-length encoding**: 8 bytes per `long` element + overhead
   - **Variable-length encoding**: 1-2 bytes per small `long` element + overhead
   - **Space savings**: Up to 75%+ reduction for arrays with values in the 
0-127 range
   
   ## Related Issues:
   
   - Addresses the need for more efficient serialization of primitive arrays 
with small values
   - Enables space-optimized serialization for use cases like sparse arrays, 
indices, counters, etc.
   
   ## Does this PR introduce any user-facing change?
   
   **No**, this PR only adds new serializer classes and test cases. The default 
serializers remain unchanged. Users can opt-in to variable-length encoding by 
using the enhanced `LongArraySerializer` with `supportVarLenEncoding=true`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to