tschaub opened a new issue, #44:
URL: https://github.com/apache/arrow-go/issues/44

   ### Describe the usage question you have. Please include as many useful 
details as  possible.
   
   
   I'm hoping to get suggestions on the best way to use the library to write a 
Parquet file given a slice of structs (Golang structs instead of Arrow's 
[array.Struct](https://pkg.go.dev/github.com/apache/arrow/go/v13@v13.0.0/arrow/array#Struct)).
   
   The 
[`parquet.NewSchemaFromStruct()`](https://pkg.go.dev/github.com/apache/arrow/go/v13@v13.0.0/parquet/schema#NewSchemaFromStruct)
 function looks like a useful starting point to generate a Parquet schema from 
a struct.
   
   The 
[`pqarrow.NewFileWriter()`](https://pkg.go.dev/github.com/apache/arrow/go/v13@v13.0.0/parquet/pqarrow#NewFileWriter)
 function is helpful for creating a writer.  And I can see how to convert a 
Parquet schema to an Arrow schema with the 
[`pqarrow.FromParquet()`](https://pkg.go.dev/github.com/apache/arrow/go/v13@v13.0.0/parquet/pqarrow#FromParquet)
 function.
   
   The 
[`writer.WriteBuffered()`](https://pkg.go.dev/github.com/apache/arrow/go/v13@v13.0.0/parquet/pqarrow#FileWriter.WriteBuffered)
 method looks like a convenient way to write an Arrow record.  So the gap is 
then to get from a slice of structs to the Arrow record.
   
   I was looking for something like `array.RecordFromSlice()`.  The 
[`array.RecordFromStructArray()`](https://pkg.go.dev/github.com/apache/arrow/go/v13@v13.0.0/arrow/array#RecordFromStructArray)
 looks useful, but I think I would have to do a fair bit of reflection to work 
with the [struct 
builder](https://pkg.go.dev/github.com/apache/arrow/go/v13@v13.0.0/arrow/array#StructBuilder).
  It looks like  
[`array.RecordFromJSON()`](https://pkg.go.dev/github.com/apache/arrow/go/v13@v13.0.0/arrow/array#RecordFromJSON)
 does the same sort of reflection that I would have to do to use the struct 
builder.
   
   I know it is not efficient, but I see that I can encode my struct slice as 
JSON and then generate a record from that.  Here is a working test that uses 
the `pqarrow.FileWriter` to write a slice of structs as Parquet:
   ```go
   package pqarrow_test
   
   import (
        "bytes"
        "encoding/json"
        "strings"
        "testing"
   
        "github.com/apache/arrow/go/v14/arrow/array"
        "github.com/apache/arrow/go/v14/arrow/memory"
        "github.com/apache/arrow/go/v14/parquet"
        "github.com/apache/arrow/go/v14/parquet/pqarrow"
        "github.com/apache/arrow/go/v14/parquet/schema"
        "github.com/stretchr/testify/require"
   )
   
   func TestFileWriterFromStructSlice(t *testing.T) {
        type Row struct {
                Name  string `parquet:"name=name, logical=String" json:"name"`
                Count int    `parquet:"name=count" json:"count"`
        }
   
        rows := []*Row{
                {
                        Name:  "row-1",
                        Count: 42,
                },
                {
                        Name:  "row-2",
                        Count: 100,
                },
        }
   
        data, err := json.Marshal(rows)
        require.NoError(t, err)
   
        parquetSchema, err := schema.NewSchemaFromStruct(rows[0])
        require.NoError(t, err)
   
        arrowSchema, err := pqarrow.FromParquet(parquetSchema, nil, nil)
        require.NoError(t, err)
   
        rec, _, err := array.RecordFromJSON(memory.DefaultAllocator, 
arrowSchema, strings.NewReader(string(data)))
        require.NoError(t, err)
   
        output := &bytes.Buffer{}
   
        writer, err := pqarrow.NewFileWriter(arrowSchema, output, 
parquet.NewWriterProperties(), pqarrow.DefaultWriterProps())
        require.NoError(t, err)
   
        require.NoError(t, writer.WriteBuffered(rec))
        require.NoError(t, writer.Close())
   }
   ```
   
   Again, I know there are more efficient ways to go from a slice of structs to 
a Parquet file.  I'm just looking for advice on the most "ergonomic" way to use 
this library to do that.  Am I missing a way to construct an Arrow record from 
a slice of structs?  Or should I not be using the `pqarrow` package at all to 
do this?
   
   
   ### Component(s)
   
   Go, Parquet


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to