Re: [I] Should we increase `BufferedChecksum`'s buffer from 1 KB -> 2 KB? [lucene]

via GitHub Thu, 29 Jan 2026 13:15:36 -0800


sgup432 commented on issue #15572:
URL: https://github.com/apache/lucene/issues/15572#issuecomment-3820378848


   There are some interesting observations. I see that our custom buffering 
logic is causing some overhead, instead using calling checksum directly might 
be beneficial.
   
   I overall compared Small Updates(`update(int b)`) and buffered updates 
`update(byte[] b, int off, int len)`.
   And then have benchmarks for direct CRC32/CRC32C, basically calling checksum 
directly instead of our custom logic `CRC32C checksum = new CRC32C(); 
checksum.update(data, 0, data.length);`
   
   Full raw results available here - 
https://github.com/apache/lucene/pull/15601#issuecomment-3820314708
   
   
   ### Summary (used AI tools to create this summary)
   **1. Direct CRC32/CRC32C Outperforms Buffered Implementation 
(dataSize=256B)**
   #### BulkUpdate Comparison (ops/s)
   
   | Data Size | Direct CRC32 | Buffered CRC32 (128B) | Buffered CRC32 (1024B) 
| Direct Speedup |
   
|----------|-------------|----------------------|------------------------|----------------|
   | 1B       | 81,058,587  | 42,282,023           | 10,217,200             | 
1.9x – 7.9x    |
   | 64B      | 169,090,583 | 58,830,152           | 9,863,842              | 
2.9x – 17x     |
   | 256B     | 141,842,296 | 74,229,773           | 9,114,770              | 
1.9x – 15x     |
   | 512B     | 95,582,735  | 66,226,546           | 9,191,298              | 
1.4x – 10x     |
   | 1024B    | 61,771,492  | 51,949,919           | 9,868,042              | 
1.2x – 6.3x    |
   
   
   **2. Small Updates: Buffering Helps Significantly**
   
   SmallUpdates Comparison - Single Byte Writes (ops/s)
   
   #### CRC32 Update Comparison (ops/s)
   
   | Data Size | Direct CRC32 | Buffered CRC32 (128B) | Buffered Better? |
   |----------|-------------|----------------------|------------------|
   | 1B       | 481,402,835 | 49,014,861           | ❌ No (9.8× slower) |
   | 16B      | 43,715,543  | 40,886,584           | ❌ No (1.07× slower) |
   | 64B      | 6,533,099   | 32,348,655           | ✅Yes (5× faster) |
   | 128B     | 3,197,091   | 22,991,835           | ✅ Yes (7× faster) |
   | 256B     | 1,515,682   | 15,001,528           | ✅ Yes (10× faster) |
   | 512B     | 723,852     | 1,316,811            | ✅ Yes (1.8× faster) |
   | 1024B    | 366,898     | 814,984              | ✅ Yes (2.2× faster) |
   
   
   **3. Direct CRC32C Outperforms Direct CRC32**
   
   BulkUpdate Comparison (ops/s)
   
   #### Direct CRC32 vs CRC32C BulkUpdate Comparison (ops/s)
   
   | Data Size | Direct CRC32 | Direct CRC32C | CRC32C Speedup |
   |----------|-------------|--------------|----------------|
   | 1B       | 81,058,587  | 309,316,765  | ✅ 3.8× faster |
   | 16B      | 259,291,849 | 275,213,224  | ✅ 1.06× faster |
   | 64B      | 169,090,583 | 153,135,289  | ❌ 0.9× (CRC32 wins) |
   | 256B     | 141,842,296 | 82,146,394   | ❌ 0.6× (CRC32 wins) |
   | 1024B    | 61,771,492  | 61,861,324   | ➖ ~equal |
   
   
   SmallUpdates Comparison (ops/s)
   
   #### Direct CRC32 vs CRC32C SmallUpdates Comparison (ops/s)
   
   | Data Size | Direct CRC32 | Direct CRC32C | CRC32C Speedup |
   |----------|-------------|--------------|----------------|
   | 1B       | 481,402,835 | 510,229,680  | ✅ 1.06× faster |
   | 16B      | 43,715,543  | 66,404,075   | ✅ 1.52× faster |
   | 64B      | 6,533,099   | 10,876,177   | ✅ 1.67× faster |
   | 128B     | 3,197,091   | 4,535,172    | ✅ 1.42× faster |
   
   
   Overall recommendations below:
   
   - **Bulk updates (any size)**  
     - Better to use **Direct**  
     - Typically **1.5× – 15× faster** than buffered variants.
   
   - **Small updates (dataSize < 64B)**  
     - Use **Direct**  
     - Buffering overhead dominates and hurts performance.
   
   - **Small updates (dataSize ≥ 64B)**  
     -  Use **Buffered (128B)**  
     - Batching amortizes per-call overhead and improves throughput.
   
   - **Algorithm choice**  
     - Prefer **CRC32C** over CRC32  
     - About **1.4× – 3.8× faster** for small and incremental data due to 
hardware acceleration.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] Should we increase `BufferedChecksum`'s buffer from 1 KB -> 2 KB? [lucene]

Reply via email to