reiades opened a new issue, #29:
URL: https://github.com/apache/arrow-go/issues/29
Hello!
I am currently using `github.com/apache/arrow/go/v16/parquet` to read the
records of a downloaded s3 parquet file (75KB, stored in `bytes.Buffer`). My
implementation is the following:
```
mem := memory.NewCheckedAllocator(memory.DefaultAllocator)
pf, err := file.NewParquetReader(bytes.NewReader(buf.Bytes()),
file.WithReadProps(parquet.NewReaderProperties(mem)))
if err != nil {
return nil, err
}
defer pf.Close()
reader, err := pqarrow.NewFileReader(pf,
pqarrow.ArrowReadProperties{Parallel: true, BatchSize: pf.NumRows()}, mem)
if err != nil {
return nil, err
}
rr, err := reader.GetRecordReader(ctx, nil, nil)
if err != nil {
return nil, err
}
defer rr.Release()
rec, err = rr.Read() <---- problem line
if err != nil && err != io.EOF {
return nil, err
}
if rec == nil {
return nil, nil
}
defer rec.Release()
... parse the file
```
I am reading the same file each time and majority of the reads into `rec`
are successful. However, on occasion, I get a segmentation fault inside of
`rr.Read()`. I have confirmed that the file is successfully downloaded each
time and that `buf.Bytes()` is the same on successful and failed reads. I have
also confirmed that I can get the schema from the file on successful and failed
reads which leads me more to believe something is happening inside the
`RecordReader`.
```
schema := pf.MetaData().Schema
log.Info(fmt.Sprintf("Schema:%s", schema)) <--- prints out the right schema
each time
```
Here are some logs from the stack trace that I thought could be helpful for
debugging.
```
SIGSEGV: segmentation violation
PC=0x4cb0c8 m=11 sigcode=1 addr=0x7ffbfdf94013e8
goroutine 150888 gp=0x4006db0a80 m=11 mp=0x4000780808 [runnable]:
github.com/apache/arrow/go/v16/parquet/internal/bmi.extractBitsGo(0xffffffffffffffff?,
0xffffffffffffffff?)
/go/pkg/mod/github.com/apache/arrow/go/[email protected]/parquet/internal/bmi/bmi.go:242
+0xcc fp=0x41bc72bae0 sp=0x41bc72bae0 pc=0x12818ac
github.com/apache/arrow/go/v16/parquet/internal/bmi.ExtractBits(...)
/go/pkg/mod/github.com/apache/arrow/go/[email protected]/parquet/internal/bmi/bmi.go:38
github.com/apache/arrow/go/v16/parquet/file.defLevelsBatchToBitmap({0x45f221c000?,
0x1?, 0x1?}, 0x400, {0xbc72bbb8?, 0x41?, 0x0?, 0x874c?}, {0x3b0f7d0,
0x41bcba5cc0}, ...)
/go/pkg/mod/github.com/apache/arrow/go/[email protected]/parquet/file/level_conversion.go:155
+0x180 fp=0x41bc72bb70 sp=0x41bc72bae0 pc=0x12f2ad0
github.com/apache/arrow/go/v16/parquet/file.defLevelsToBitmapInternal({0x45f221c000,
0x400, 0x2c000}, {0x1?, 0x0?, 0x0?, 0x1?}, 0x41bc72bcc0, 0x1)
/go/pkg/mod/github.com/apache/arrow/go/[email protected]/parquet/file/level_conversion.go:175
+0x198 fp=0x41bc72bc40 sp=0x41bc72bb70 pc=0x12f2d68
github.com/apache/arrow/go/v16/parquet/file.DefLevelsToBitmap(...)
/go/pkg/mod/github.com/apache/arrow/go/[email protected]/parquet/file/level_conversion.go:186
github.com/apache/arrow/go/v16/parquet/file.(*recordReader).ReadRecordData(0x41bb5c8000,
0x11)
/go/pkg/mod/github.com/apache/arrow/go/[email protected]/parquet/file/record_reader.go:545
+0x218 fp=0x41bc72bd40 sp=0x41bc72bc40 pc=0x12f8aa8
github.com/apache/arrow/go/v16/parquet/file.(*recordReader).ReadRecords(0x41bb5c8000,
0xce)
/go/pkg/mod/github.com/apache/arrow/go/[email protected]/parquet/file/record_reader.go:632
+0x294 fp=0x41bc72bde0 sp=0x41bc72bd40 pc=0x12f8e84
github.com/apache/arrow/go/v16/parquet/pqarrow.(*leafReader).LoadBatch(0x41bb5c8060,
0xce)
/go/pkg/mod/github.com/apache/arrow/go/[email protected]/parquet/pqarrow/column_readers.go:104
+0xd8 fp=0x41bc72be30 sp=0x41bc72bde0 pc=0x1767e48
github.com/apache/arrow/go/v16/parquet/pqarrow.(*listReader).LoadBatch(0x41bc72bee8?,
0x41bc72bf3c?)
/go/pkg/mod/github.com/apache/arrow/go/[email protected]/parquet/pqarrow/column_readers.go:360
+0x2c fp=0x41bc72be50 sp=0x41bc72be30 pc=0x17690fc
github.com/apache/arrow/go/v16/parquet/pqarrow.(*ColumnReader).NextBatch(0x41b9013190,
0xce)
/go/pkg/mod/github.com/apache/arrow/go/[email protected]/parquet/pqarrow/file_reader.go:131
+0x34 fp=0x41bc72be70 sp=0x41bc72be50 pc=0x176e9d4
github.com/apache/arrow/go/v16/parquet/pqarrow.(*recordReader).next.func1(0x5,
0x41bc72bf38?)
/go/pkg/mod/github.com/apache/arrow/go/[email protected]/parquet/pqarrow/file_reader.go:655
+0x50 fp=0x41bc72bef0 sp=0x41bc72be70 pc=0x17729a0
github.com/apache/arrow/go/v16/parquet/pqarrow.(*recordReader).next.func2()
/go/pkg/mod/github.com/apache/arrow/go/[email protected]/parquet/pqarrow/file_reader.go:708
+0x100 fp=0x41bc72bfd0 sp=0x41bc72bef0 pc=0x1772850
runtime.goexit({})
/root/.gimme/versions/go1.22.5.linux.arm64/src/runtime/asm_arm64.s:1222
+0x4 fp=0x41bc72bfd0 sp=0x41bc72bfd0 pc=0x4df0a4
created by
github.com/apache/arrow/go/v16/parquet/pqarrow.(*recordReader).next in
goroutine 253
/go/pkg/mod/github.com/apache/arrow/go/[email protected]/parquet/pqarrow/file_reader.go:699
+0x2e8
...
```
It seems that the segmentation fault is happening inside of
`(*recordReader).next` so was curious if anyone familiar with this library had
some insight on why this was happening. I can share a longer stack trace if
that would be helpful. I am also using v16 but saw the same error in v13 as
well. Thanks in advance!
### Component(s)
Go
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]