Re: [PR] feat: eliminate GenericDatum in Avro reader for performance [iceberg-cpp]

via GitHub Sat, 13 Dec 2025 06:38:47 -0800


shangxinli commented on PR #374:
URL: https://github.com/apache/iceberg-cpp/pull/374#issuecomment-3649495069


   > BTW, it is possible to add a executable under `iceberg/avro` folder named 
`avro_scan`? We can use it to benchmark on a local avro file and be able to 
config whether to skip datum or not.
   
   Just created the avro_scan benchmark executable in src/iceberg/avro/.
   
     Features:
     - Configurable decoder: --skip-datum=true (direct decoder) or false 
(GenericDatum)
     - Batch size control: --batch-size=N to configure batch sizes
     - Performance metrics: Reports total rows, time, and throughput (rows/sec 
and MB/sec)
     - Schema inspection: Displays the schema of the Avro file
   
     Usage:
     # Build
     cmake --build build --target avro_scan
   
     # Run with direct decoder (default)
     ./build/src/iceberg/avro/avro_scan data.avro
   
     # Run with GenericDatum decoder
     ./build/src/iceberg/avro/avro_scan --skip-datum=false data.avro
   
     # Custom batch size
     ./build/src/iceberg/avro/avro_scan --batch-size=1000 --skip-datum=true 
data.avro
   
     # Help
     ./build/src/iceberg/avro/avro_scan --help
   
     Output example:
     Scanning Avro file: data.avro
     Skip datum: true
     Batch size: 4096
     ------------------------------------------------------------
     File size: 12345678 bytes
     Schema: struct<id: int64, name: string, value: double>
     ------------------------------------------------------------
   
     Results:
       Total rows: 1000000
       Batches: 245
       Time: 1250 ms
       Throughput: 800000 rows/sec
       Throughput: 9.45 MB/sec
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] feat: eliminate GenericDatum in Avro reader for performance [iceberg-cpp]

Reply via email to