tschaub opened a new issue, #44: URL: https://github.com/apache/arrow-go/issues/44
### Describe the usage question you have. Please include as many useful details as possible. I'm hoping to get suggestions on the best way to use the library to write a Parquet file given a slice of structs (Golang structs instead of Arrow's [array.Struct](https://pkg.go.dev/github.com/apache/arrow/go/[email protected]/arrow/array#Struct)). The [`parquet.NewSchemaFromStruct()`](https://pkg.go.dev/github.com/apache/arrow/go/[email protected]/parquet/schema#NewSchemaFromStruct) function looks like a useful starting point to generate a Parquet schema from a struct. The [`pqarrow.NewFileWriter()`](https://pkg.go.dev/github.com/apache/arrow/go/[email protected]/parquet/pqarrow#NewFileWriter) function is helpful for creating a writer. And I can see how to convert a Parquet schema to an Arrow schema with the [`pqarrow.FromParquet()`](https://pkg.go.dev/github.com/apache/arrow/go/[email protected]/parquet/pqarrow#FromParquet) function. The [`writer.WriteBuffered()`](https://pkg.go.dev/github.com/apache/arrow/go/[email protected]/parquet/pqarrow#FileWriter.WriteBuffered) method looks like a convenient way to write an Arrow record. So the gap is then to get from a slice of structs to the Arrow record. I was looking for something like `array.RecordFromSlice()`. The [`array.RecordFromStructArray()`](https://pkg.go.dev/github.com/apache/arrow/go/[email protected]/arrow/array#RecordFromStructArray) looks useful, but I think I would have to do a fair bit of reflection to work with the [struct builder](https://pkg.go.dev/github.com/apache/arrow/go/[email protected]/arrow/array#StructBuilder). It looks like [`array.RecordFromJSON()`](https://pkg.go.dev/github.com/apache/arrow/go/[email protected]/arrow/array#RecordFromJSON) does the same sort of reflection that I would have to do to use the struct builder. I know it is not efficient, but I see that I can encode my struct slice as JSON and then generate a record from that. Here is a working test that uses the `pqarrow.FileWriter` to write a slice of structs as Parquet: ```go package pqarrow_test import ( "bytes" "encoding/json" "strings" "testing" "github.com/apache/arrow/go/v14/arrow/array" "github.com/apache/arrow/go/v14/arrow/memory" "github.com/apache/arrow/go/v14/parquet" "github.com/apache/arrow/go/v14/parquet/pqarrow" "github.com/apache/arrow/go/v14/parquet/schema" "github.com/stretchr/testify/require" ) func TestFileWriterFromStructSlice(t *testing.T) { type Row struct { Name string `parquet:"name=name, logical=String" json:"name"` Count int `parquet:"name=count" json:"count"` } rows := []*Row{ { Name: "row-1", Count: 42, }, { Name: "row-2", Count: 100, }, } data, err := json.Marshal(rows) require.NoError(t, err) parquetSchema, err := schema.NewSchemaFromStruct(rows[0]) require.NoError(t, err) arrowSchema, err := pqarrow.FromParquet(parquetSchema, nil, nil) require.NoError(t, err) rec, _, err := array.RecordFromJSON(memory.DefaultAllocator, arrowSchema, strings.NewReader(string(data))) require.NoError(t, err) output := &bytes.Buffer{} writer, err := pqarrow.NewFileWriter(arrowSchema, output, parquet.NewWriterProperties(), pqarrow.DefaultWriterProps()) require.NoError(t, err) require.NoError(t, writer.WriteBuffered(rec)) require.NoError(t, writer.Close()) } ``` Again, I know there are more efficient ways to go from a slice of structs to a Parquet file. I'm just looking for advice on the most "ergonomic" way to use this library to do that. Am I missing a way to construct an Arrow record from a slice of structs? Or should I not be using the `pqarrow` package at all to do this? ### Component(s) Go, Parquet -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
