aubm opened a new issue, #595:
URL: https://github.com/apache/iceberg-go/issues/595

   ### Apache Iceberg version
   
   main (development)
   
   ### Please describe the bug 🐞
   
   ### Summary
   
   I might be misusing the libraries (I’m new to Arrow/Iceberg), but I’m seeing 
a reproducible failure when appending a record that contains a **map<string, 
string?>** column in `iceberg-go`. The append works when each map row has 
exactly **one** entry, but it fails as soon as a map row has **more than one** 
entry.
   
   Lists with varying lengths don’t trigger the error; only maps do.
   
   ### Expected behavior
   
   Appending an Arrow record (two rows) to an Iceberg table with a required 
`map<string, string?>` column should succeed regardless of how many entries 
each map row has.
   
   ### Actual behavior
   
   Append fails with:
   
   ```
   panic: arrow/array: index out of range
   ```
   
   Stack trace (example from my runs):
   
   ```
   panic: arrow/array: index out of range
   
   goroutine 50 [running]:
   github.com/apache/arrow-go/v18/arrow/array.NewSliceData({0x1072b5780, 
0x1400029e070}, 0x0, 0x3)
        
.../go/pkg/mod/github.com/apache/arrow-go/[email protected]/arrow/array/data.go:232 
+0x384
   github.com/apache/arrow-go/v18/arrow/array.NewSlice({0x1072bc5e0?, 
0x140010ee500?}, 0x0, 0x3)
        
.../go/pkg/mod/github.com/apache/arrow-go/[email protected]/arrow/array/array.go:130 
+0x40
   
github.com/apache/arrow-go/v18/parquet/pqarrow.(*arrowColumnWriter).Write(0x140000836e8,
 {0x10728a328, 0x14000cdd350})
        
.../go/pkg/mod/github.com/apache/arrow-go/[email protected]/parquet/pqarrow/encode_arrow.go:191
 +0x310
   
github.com/apache/arrow-go/v18/parquet/pqarrow.(*FileWriter).WriteColumnChunked(0x14000277c70,
 0x140010b2f90?, 0x140000837b0?, 0x1?)
        
.../go/pkg/mod/github.com/apache/arrow-go/[email protected]/parquet/pqarrow/file_writer.go:330
 +0xa8
   
github.com/apache/arrow-go/v18/parquet/pqarrow.(*FileWriter).WriteColumnData(0x14000277c70,
 {0x1072bcde0, 0x14000cdc6f0})
        
.../go/pkg/mod/github.com/apache/arrow-go/[email protected]/parquet/pqarrow/file_writer.go:339
 +0x9c
   
github.com/apache/arrow-go/v18/parquet/pqarrow.(*FileWriter).WriteBuffered(0x14000277c70,
 {0x1072b57f0, 0x14000cdc750})
        
.../go/pkg/mod/github.com/apache/arrow-go/[email protected]/parquet/pqarrow/file_writer.go:203
 +0x270
   github.com/apache/iceberg-go/table/internal.parquetFormat.WriteDataFile({}, 
{0x10728a2f0, 0x108de0380}, {0x130c95468?, 0x14001098d60?}, 0x0, 
{0x140002760e0, {0x0, {0x108de0380, 0x0, ...}, ...}, ...}, ...)
        
.../go/pkg/mod/github.com/apache/[email protected]/table/internal/parquet_files.go:258
 +0x2c4
   github.com/apache/iceberg-go/table.(*writer).writeFile(0x140010b8f50, 
{0x10728a2f0, 0x108de0380}, 0x0, {{0x9d, 0xbd, 0x5c, 0x95, 0x25, 0x7d, ...}, 
...})
        
.../go/pkg/mod/github.com/apache/[email protected]/table/writer.go:86 +0x36c
   github.com/apache/iceberg-go/table.writeFiles.func3({{0x9d, 0xbd, 0x5c, 
0x95, 0x25, 0x7d, 0x4e, 0xcd, 0x8b, 0xcc, ...}, ...})
        
.../go/pkg/mod/github.com/apache/[email protected]/table/writer.go:131 +0x44
   github.com/apache/iceberg-go/table/internal.MapExec[...].func1()
        
.../go/pkg/mod/github.com/apache/[email protected]/table/internal/utils.go:535
 +0xc8
   golang.org/x/sync/errgroup.(*Group).Go.func1()
        .../go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:93 +0x4c
   created by golang.org/x/sync/errgroup.(*Group).Go in goroutine 1
        .../go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:78 +0x90
   ```
   
   (With `arrow-go` v18.4.1 I still see the failure; I upgraded to 
`[email protected]` to address a Substrait compile clash but the append 
panic remains.)
   
   ### Reproduction (minimal program)
   
   I attached a small Go program (below) that:
   
   1. Creates an Iceberg table with two columns:
   
      * `id: string` (required, field-id 1)
      * `attrs: map<string, string?>` (required, field-id 2; key-id 3; value-id 
4)
   2. Derives an Arrow schema via `table.SchemaToArrowSchema(... 
includeFieldIDs=true, useLargeTypes=false)`.
   3. Builds **one** Arrow record with **two** rows:
   
      * row 0: `id="row-0"`, `attrs={"a":"1"}`
      * row 1: `id="row-1"`, `attrs={"x":"9","y":"z"}`  ← adding the 2nd map 
entry triggers the failure
   4. Uses `array.NewRecordReader(rec.Schema(), []arrow.Record{rec})` to 
preserve `PARQUET:field_id` metadata.
   5. Calls `tbl.Append(ctx, rr, iceberg.Properties{})`.
   
   If I comment out the second key/value (`"y":"z"`) in row 1, the append 
succeeds. Lists with varying lengths work fine; it’s specifically **maps with 
>1 entry in a row** that fail.
   
   ```go
   package main
   
   import (
        "context"
        "fmt"
        "log"
   
        "github.com/apache/arrow-go/v18/arrow"
        "github.com/apache/arrow-go/v18/arrow/array"
        "github.com/apache/arrow-go/v18/arrow/memory"
   
        "github.com/apache/iceberg-go"
        "github.com/apache/iceberg-go/catalog"
        "github.com/apache/iceberg-go/catalog/rest"
        "github.com/apache/iceberg-go/table"
   )
   
   func main() {
        ctx := context.Background()
   
        // 1) Iceberg schema:
        //   id: string (required, field-id 1)
        //   attrs: map<string, string?> (required, field-id 5; key-id 6, 
value-id 7)
        iceSch := iceberg.NewSchema(1, // schema-id
                iceberg.NestedField{
                        ID:       1,
                        Name:     "id",
                        Required: true,
                        Type:     iceberg.PrimitiveTypes.String,
                },
                iceberg.NestedField{
                        ID:       2,
                        Name:     "attrs",
                        Required: true,
                        Type: &iceberg.MapType{
                                KeyID:         3,
                                KeyType:       iceberg.PrimitiveTypes.String,
                                ValueID:       4,
                                ValueType:     iceberg.PrimitiveTypes.String,
                                ValueRequired: false, // <- values are nullable
                        },
                },
        )
   
        cat, err := rest.NewCatalog(ctx, "rest", 
"http://localhost:19120/iceberg/";)
        if err != nil {
                log.Fatalf("rest.NewCatalog: %v", err)
        }
        ident := catalog.ToIdentifier("public", "repro_map")
        if err := cat.DropTable(ctx, ident); err != nil { // best-effort cleanup
                log.Println("failed to drop table:", err)
        }
   
        tbl, err := cat.CreateTable(ctx, ident, iceSch)
        if err != nil {
                log.Fatalf("CreateTable: %v", err)
        }
   
        // 2) Arrow schema derived from Iceberg (include field-ids, SMALL types)
        arrowSch, err := table.SchemaToArrowSchema(tbl.Schema(), nil /*extra 
md*/, true /*includeFieldIDs*/, false /*useLargeTypes*/)
        if err != nil {
                log.Fatalf("SchemaToArrowSchema: %v", err)
        }
        fmt.Println("Arrow schema:", arrowSch)
   
        // 3) Build ONE Arrow record with two rows of different map sizes.
        pool := memory.NewGoAllocator()
        recordBuilder := array.NewRecordBuilder(pool, arrowSch)
        defer recordBuilder.Release()
   
        idStringBuilder := recordBuilder.Field(0).(*array.StringBuilder)
        mapBuilder := recordBuilder.Field(1).(*array.MapBuilder)
        keyBuilder := mapBuilder.KeyBuilder().(*array.StringBuilder)
        valueBuilder := mapBuilder.ItemBuilder().(*array.StringBuilder)
   
        // row 0: {"a":"1"}
        idStringBuilder.Append("row-0")
        mapBuilder.Append(true)
        keyBuilder.Append("a")
        valueBuilder.Append("1")
   
        // row 1: {"x":"9","y":"z"}
        idStringBuilder.Append("row-1")
        mapBuilder.Append(true)
        keyBuilder.Append("x")
        valueBuilder.Append("9")
        // TODO: comment the next two lines to work around the bug
        keyBuilder.Append("y")
        valueBuilder.Append("z")
   
        rec := recordBuilder.NewRecordBatch()
        defer rec.Release()
   
        // 4) Use the record's OWN schema with array.NewRecordReader.
        rr, err := array.NewRecordReader(rec.Schema(), []arrow.RecordBatch{rec})
        if err != nil {
                log.Fatalf("NewRecordReader: %v", err)
        }
        defer rr.Release()
   
        // 5) Append — depending on versions, this may fail with the slicing 
bug.
        if _, err := tbl.Append(ctx, rr, iceberg.Properties{}); err != nil {
                // The code is expected to panic with: arrow/array: index out 
of range
                fmt.Println("Append failed:", err)
                return
        }
        fmt.Println("Append succeeded (no bug hit with this env)")
   }
   ```
   
   ### Environment
   
   * Go: `go version` reports 1.25.1 on my machine
   * `github.com/apache/iceberg-go`: **v0.4.0-rc1** (also tried v0.3.0)
   * `github.com/apache/arrow-go/v18`: tried **v18.3.0** and **v18.4.1**
   * Catalog: **REST** against a local Nessie endpoint
     Example: `rest.NewCatalog(ctx, "rest", "http://localhost:19120/iceberg/";)`
   * Table props: defaults; writing **PARQUET**
   
   ### Extra observations
   
   * Record sanity checks show consistent shapes (rows/offsets) before append. 
Example from a debug printing:
   
     ```
     col 0 id                     utf8                           rows=2
     col 1 attrs                  map<utf8, utf8, items_nullable> rows=2
     ---
     MAP attrs: rows=2 lastOff=3 keysLen=3 valsLen=3
     ```
   
   * Switching to Arrow **large types** (`useLargeTypes=true`) isn’t an option 
for me because Iceberg currently returns `not implemented: support for 
LARGE_LIST` (different limitation, so I stayed with small types for this repro).
   
   * Using **lists** instead of maps with variable per-row lengths does **not** 
crash.
   
   ### What I suspect (but may be wrong)
   
   This looks like a bug in the Arrow→Parquet write path when slicing map child 
arrays (keys/values) during writing; it only shows up when a map row has more 
than one entry. If this should work and I’m just building the map incorrectly, 
I’d really appreciate guidance on the right pattern.
   
   ### What would help me
   
   * Confirmation whether this is a known issue or misuse.
   * If it’s a bug, a pointer to the right place to patch/test would be great; 
I’m happy to try a branch.
   * If it’s misuse, a small example showing the correct way to append 
`map<string, string?>` with >1 entries per row via `iceberg-go` would be super 
helpful.
   
   Thanks for taking a look, and sorry in advance if I’ve overlooked something 
obvious!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to