shangxinli commented on PR #374:
URL: https://github.com/apache/iceberg-cpp/pull/374#issuecomment-3649495069
> BTW, it is possible to add a executable under `iceberg/avro` folder named
`avro_scan`? We can use it to benchmark on a local avro file and be able to
config whether to skip datum or not.
Just created the avro_scan benchmark executable in src/iceberg/avro/.
Features:
- Configurable decoder: --skip-datum=true (direct decoder) or false
(GenericDatum)
- Batch size control: --batch-size=N to configure batch sizes
- Performance metrics: Reports total rows, time, and throughput (rows/sec
and MB/sec)
- Schema inspection: Displays the schema of the Avro file
Usage:
# Build
cmake --build build --target avro_scan
# Run with direct decoder (default)
./build/src/iceberg/avro/avro_scan data.avro
# Run with GenericDatum decoder
./build/src/iceberg/avro/avro_scan --skip-datum=false data.avro
# Custom batch size
./build/src/iceberg/avro/avro_scan --batch-size=1000 --skip-datum=true
data.avro
# Help
./build/src/iceberg/avro/avro_scan --help
Output example:
Scanning Avro file: data.avro
Skip datum: true
Batch size: 4096
------------------------------------------------------------
File size: 12345678 bytes
Schema: struct<id: int64, name: string, value: double>
------------------------------------------------------------
Results:
Total rows: 1000000
Batches: 245
Time: 1250 ms
Throughput: 800000 rows/sec
Throughput: 9.45 MB/sec
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]