aubm opened a new issue, #595:
URL: https://github.com/apache/iceberg-go/issues/595
### Apache Iceberg version
main (development)
### Please describe the bug 🐞
### Summary
I might be misusing the libraries (I’m new to Arrow/Iceberg), but I’m seeing
a reproducible failure when appending a record that contains a **map<string,
string?>** column in `iceberg-go`. The append works when each map row has
exactly **one** entry, but it fails as soon as a map row has **more than one**
entry.
Lists with varying lengths don’t trigger the error; only maps do.
### Expected behavior
Appending an Arrow record (two rows) to an Iceberg table with a required
`map<string, string?>` column should succeed regardless of how many entries
each map row has.
### Actual behavior
Append fails with:
```
panic: arrow/array: index out of range
```
Stack trace (example from my runs):
```
panic: arrow/array: index out of range
goroutine 50 [running]:
github.com/apache/arrow-go/v18/arrow/array.NewSliceData({0x1072b5780,
0x1400029e070}, 0x0, 0x3)
.../go/pkg/mod/github.com/apache/arrow-go/[email protected]/arrow/array/data.go:232
+0x384
github.com/apache/arrow-go/v18/arrow/array.NewSlice({0x1072bc5e0?,
0x140010ee500?}, 0x0, 0x3)
.../go/pkg/mod/github.com/apache/arrow-go/[email protected]/arrow/array/array.go:130
+0x40
github.com/apache/arrow-go/v18/parquet/pqarrow.(*arrowColumnWriter).Write(0x140000836e8,
{0x10728a328, 0x14000cdd350})
.../go/pkg/mod/github.com/apache/arrow-go/[email protected]/parquet/pqarrow/encode_arrow.go:191
+0x310
github.com/apache/arrow-go/v18/parquet/pqarrow.(*FileWriter).WriteColumnChunked(0x14000277c70,
0x140010b2f90?, 0x140000837b0?, 0x1?)
.../go/pkg/mod/github.com/apache/arrow-go/[email protected]/parquet/pqarrow/file_writer.go:330
+0xa8
github.com/apache/arrow-go/v18/parquet/pqarrow.(*FileWriter).WriteColumnData(0x14000277c70,
{0x1072bcde0, 0x14000cdc6f0})
.../go/pkg/mod/github.com/apache/arrow-go/[email protected]/parquet/pqarrow/file_writer.go:339
+0x9c
github.com/apache/arrow-go/v18/parquet/pqarrow.(*FileWriter).WriteBuffered(0x14000277c70,
{0x1072b57f0, 0x14000cdc750})
.../go/pkg/mod/github.com/apache/arrow-go/[email protected]/parquet/pqarrow/file_writer.go:203
+0x270
github.com/apache/iceberg-go/table/internal.parquetFormat.WriteDataFile({},
{0x10728a2f0, 0x108de0380}, {0x130c95468?, 0x14001098d60?}, 0x0,
{0x140002760e0, {0x0, {0x108de0380, 0x0, ...}, ...}, ...}, ...)
.../go/pkg/mod/github.com/apache/[email protected]/table/internal/parquet_files.go:258
+0x2c4
github.com/apache/iceberg-go/table.(*writer).writeFile(0x140010b8f50,
{0x10728a2f0, 0x108de0380}, 0x0, {{0x9d, 0xbd, 0x5c, 0x95, 0x25, 0x7d, ...},
...})
.../go/pkg/mod/github.com/apache/[email protected]/table/writer.go:86 +0x36c
github.com/apache/iceberg-go/table.writeFiles.func3({{0x9d, 0xbd, 0x5c,
0x95, 0x25, 0x7d, 0x4e, 0xcd, 0x8b, 0xcc, ...}, ...})
.../go/pkg/mod/github.com/apache/[email protected]/table/writer.go:131 +0x44
github.com/apache/iceberg-go/table/internal.MapExec[...].func1()
.../go/pkg/mod/github.com/apache/[email protected]/table/internal/utils.go:535
+0xc8
golang.org/x/sync/errgroup.(*Group).Go.func1()
.../go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:93 +0x4c
created by golang.org/x/sync/errgroup.(*Group).Go in goroutine 1
.../go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:78 +0x90
```
(With `arrow-go` v18.4.1 I still see the failure; I upgraded to
`[email protected]` to address a Substrait compile clash but the append
panic remains.)
### Reproduction (minimal program)
I attached a small Go program (below) that:
1. Creates an Iceberg table with two columns:
* `id: string` (required, field-id 1)
* `attrs: map<string, string?>` (required, field-id 2; key-id 3; value-id
4)
2. Derives an Arrow schema via `table.SchemaToArrowSchema(...
includeFieldIDs=true, useLargeTypes=false)`.
3. Builds **one** Arrow record with **two** rows:
* row 0: `id="row-0"`, `attrs={"a":"1"}`
* row 1: `id="row-1"`, `attrs={"x":"9","y":"z"}` ← adding the 2nd map
entry triggers the failure
4. Uses `array.NewRecordReader(rec.Schema(), []arrow.Record{rec})` to
preserve `PARQUET:field_id` metadata.
5. Calls `tbl.Append(ctx, rr, iceberg.Properties{})`.
If I comment out the second key/value (`"y":"z"`) in row 1, the append
succeeds. Lists with varying lengths work fine; it’s specifically **maps with
>1 entry in a row** that fail.
```go
package main
import (
"context"
"fmt"
"log"
"github.com/apache/arrow-go/v18/arrow"
"github.com/apache/arrow-go/v18/arrow/array"
"github.com/apache/arrow-go/v18/arrow/memory"
"github.com/apache/iceberg-go"
"github.com/apache/iceberg-go/catalog"
"github.com/apache/iceberg-go/catalog/rest"
"github.com/apache/iceberg-go/table"
)
func main() {
ctx := context.Background()
// 1) Iceberg schema:
// id: string (required, field-id 1)
// attrs: map<string, string?> (required, field-id 5; key-id 6,
value-id 7)
iceSch := iceberg.NewSchema(1, // schema-id
iceberg.NestedField{
ID: 1,
Name: "id",
Required: true,
Type: iceberg.PrimitiveTypes.String,
},
iceberg.NestedField{
ID: 2,
Name: "attrs",
Required: true,
Type: &iceberg.MapType{
KeyID: 3,
KeyType: iceberg.PrimitiveTypes.String,
ValueID: 4,
ValueType: iceberg.PrimitiveTypes.String,
ValueRequired: false, // <- values are nullable
},
},
)
cat, err := rest.NewCatalog(ctx, "rest",
"http://localhost:19120/iceberg/")
if err != nil {
log.Fatalf("rest.NewCatalog: %v", err)
}
ident := catalog.ToIdentifier("public", "repro_map")
if err := cat.DropTable(ctx, ident); err != nil { // best-effort cleanup
log.Println("failed to drop table:", err)
}
tbl, err := cat.CreateTable(ctx, ident, iceSch)
if err != nil {
log.Fatalf("CreateTable: %v", err)
}
// 2) Arrow schema derived from Iceberg (include field-ids, SMALL types)
arrowSch, err := table.SchemaToArrowSchema(tbl.Schema(), nil /*extra
md*/, true /*includeFieldIDs*/, false /*useLargeTypes*/)
if err != nil {
log.Fatalf("SchemaToArrowSchema: %v", err)
}
fmt.Println("Arrow schema:", arrowSch)
// 3) Build ONE Arrow record with two rows of different map sizes.
pool := memory.NewGoAllocator()
recordBuilder := array.NewRecordBuilder(pool, arrowSch)
defer recordBuilder.Release()
idStringBuilder := recordBuilder.Field(0).(*array.StringBuilder)
mapBuilder := recordBuilder.Field(1).(*array.MapBuilder)
keyBuilder := mapBuilder.KeyBuilder().(*array.StringBuilder)
valueBuilder := mapBuilder.ItemBuilder().(*array.StringBuilder)
// row 0: {"a":"1"}
idStringBuilder.Append("row-0")
mapBuilder.Append(true)
keyBuilder.Append("a")
valueBuilder.Append("1")
// row 1: {"x":"9","y":"z"}
idStringBuilder.Append("row-1")
mapBuilder.Append(true)
keyBuilder.Append("x")
valueBuilder.Append("9")
// TODO: comment the next two lines to work around the bug
keyBuilder.Append("y")
valueBuilder.Append("z")
rec := recordBuilder.NewRecordBatch()
defer rec.Release()
// 4) Use the record's OWN schema with array.NewRecordReader.
rr, err := array.NewRecordReader(rec.Schema(), []arrow.RecordBatch{rec})
if err != nil {
log.Fatalf("NewRecordReader: %v", err)
}
defer rr.Release()
// 5) Append — depending on versions, this may fail with the slicing
bug.
if _, err := tbl.Append(ctx, rr, iceberg.Properties{}); err != nil {
// The code is expected to panic with: arrow/array: index out
of range
fmt.Println("Append failed:", err)
return
}
fmt.Println("Append succeeded (no bug hit with this env)")
}
```
### Environment
* Go: `go version` reports 1.25.1 on my machine
* `github.com/apache/iceberg-go`: **v0.4.0-rc1** (also tried v0.3.0)
* `github.com/apache/arrow-go/v18`: tried **v18.3.0** and **v18.4.1**
* Catalog: **REST** against a local Nessie endpoint
Example: `rest.NewCatalog(ctx, "rest", "http://localhost:19120/iceberg/")`
* Table props: defaults; writing **PARQUET**
### Extra observations
* Record sanity checks show consistent shapes (rows/offsets) before append.
Example from a debug printing:
```
col 0 id utf8 rows=2
col 1 attrs map<utf8, utf8, items_nullable> rows=2
---
MAP attrs: rows=2 lastOff=3 keysLen=3 valsLen=3
```
* Switching to Arrow **large types** (`useLargeTypes=true`) isn’t an option
for me because Iceberg currently returns `not implemented: support for
LARGE_LIST` (different limitation, so I stayed with small types for this repro).
* Using **lists** instead of maps with variable per-row lengths does **not**
crash.
### What I suspect (but may be wrong)
This looks like a bug in the Arrow→Parquet write path when slicing map child
arrays (keys/values) during writing; it only shows up when a map row has more
than one entry. If this should work and I’m just building the map incorrectly,
I’d really appreciate guidance on the right pattern.
### What would help me
* Confirmation whether this is a known issue or misuse.
* If it’s a bug, a pointer to the right place to patch/test would be great;
I’m happy to try a branch.
* If it’s misuse, a small example showing the correct way to append
`map<string, string?>` with >1 entries per row via `iceberg-go` would be super
helpful.
Thanks for taking a look, and sorry in advance if I’ve overlooked something
obvious!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]