[I] Output only RecordBatches with an ipc.Writer [arrow-go]

via GitHub Tue, 24 Jun 2025 06:52:07 -0700


alvarowolfx opened a new issue, #425:
URL: https://github.com/apache/arrow-go/issues/425


   ### Describe the enhancement requested
   
   Right now the `ipc.NewWriter` writes the schema data to the output buffer, 
which is not ideal for some scenarios where the target expect only RecordBatch 
messages.  The kind of equivalent to that in the Python land with pyarrow is 
that `pyarrow.Table` has a `to_batches`  method, that outputs only RecordBatch 
messages.
   
   We found this issue by trying to use `arrow-go` with the BigQuery Storage 
Write API, which now supports data in Arrow format. Using an `ipc.Writer` makes 
BigQuery reject the output, as the first message is one with the schema, not a 
`RecordBatch`.
   
   Draft PR on #421
   
   References
   * Arrow support on BigQuery Storage Write API - 
https://cloud.google.com/bigquery/docs/supported-data-types#supported-apache-arrow-data-types
   * PyArrow and BigQuery Storage Write API example - 
https://cloud.google.com/bigquery/docs/write-api-streaming#arrow-format
   * Issue with BigQuery Storage Write API and `arrow-go` - 
https://github.com/googleapis/google-cloud-go/issues/12478
   
   ### Component(s)
   
   Other


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[I] Output only RecordBatches with an ipc.Writer [arrow-go]

Reply via email to