reiades opened a new issue, #43520: URL: https://github.com/apache/arrow/issues/43520
### Describe the bug, including details regarding any error messages, version, and platform. Hello! I am currently using `github.com/apache/arrow/go/v16/parquet` to read the records of a downloaded s3 parquet file (75KB, stored in `bytes.Buffer`). My implementation is the following: ``` mem := memory.NewCheckedAllocator(memory.DefaultAllocator) pf, err := file.NewParquetReader(bytes.NewReader(buf.Bytes()), file.WithReadProps(parquet.NewReaderProperties(mem))) if err != nil { return nil, err } defer pf.Close() reader, err := pqarrow.NewFileReader(pf, pqarrow.ArrowReadProperties{Parallel: true, BatchSize: pf.NumRows()}, mem) if err != nil { return nil, err } rr, err := reader.GetRecordReader(ctx, nil, nil) if err != nil { return nil, err } defer rr.Release() rec, err = rr.Read() <---- problem line if err != nil && err != io.EOF { return nil, err } if rec == nil { return nil, nil } defer rec.Release() ... parse the file ``` I am reading the same file each time and majority of the reads into `rec` are successful. However, on occasion, I get a segmentation fault inside of `rr.Read()`. I have confirmed that the file is successfully downloaded each time and that `buf.Bytes()` is the same on successful and failed reads. I have also confirmed that I can get the schema from the file on successful and failed reads which leads me more to believe something is happening inside the `RecordReader`. ``` schema := pf.MetaData().Schema log.Info(fmt.Sprintf("Schema:%s", schema)) <--- prints out the right schema each time ``` Here are some logs from the stack trace that I thought could be helpful for debugging. ``` SIGSEGV: segmentation violation PC=0x4cb0c8 m=11 sigcode=1 addr=0x7ffbfdf94013e8 goroutine 150888 gp=0x4006db0a80 m=11 mp=0x4000780808 [runnable]: github.com/apache/arrow/go/v16/parquet/internal/bmi.extractBitsGo(0xffffffffffffffff?, 0xffffffffffffffff?) /go/pkg/mod/github.com/apache/arrow/go/v16@v16.1.0/parquet/internal/bmi/bmi.go:242 +0xcc fp=0x41bc72bae0 sp=0x41bc72bae0 pc=0x12818ac github.com/apache/arrow/go/v16/parquet/internal/bmi.ExtractBits(...) /go/pkg/mod/github.com/apache/arrow/go/v16@v16.1.0/parquet/internal/bmi/bmi.go:38 github.com/apache/arrow/go/v16/parquet/file.defLevelsBatchToBitmap({0x45f221c000?, 0x1?, 0x1?}, 0x400, {0xbc72bbb8?, 0x41?, 0x0?, 0x874c?}, {0x3b0f7d0, 0x41bcba5cc0}, ...) /go/pkg/mod/github.com/apache/arrow/go/v16@v16.1.0/parquet/file/level_conversion.go:155 +0x180 fp=0x41bc72bb70 sp=0x41bc72bae0 pc=0x12f2ad0 github.com/apache/arrow/go/v16/parquet/file.defLevelsToBitmapInternal({0x45f221c000, 0x400, 0x2c000}, {0x1?, 0x0?, 0x0?, 0x1?}, 0x41bc72bcc0, 0x1) /go/pkg/mod/github.com/apache/arrow/go/v16@v16.1.0/parquet/file/level_conversion.go:175 +0x198 fp=0x41bc72bc40 sp=0x41bc72bb70 pc=0x12f2d68 github.com/apache/arrow/go/v16/parquet/file.DefLevelsToBitmap(...) /go/pkg/mod/github.com/apache/arrow/go/v16@v16.1.0/parquet/file/level_conversion.go:186 github.com/apache/arrow/go/v16/parquet/file.(*recordReader).ReadRecordData(0x41bb5c8000, 0x11) /go/pkg/mod/github.com/apache/arrow/go/v16@v16.1.0/parquet/file/record_reader.go:545 +0x218 fp=0x41bc72bd40 sp=0x41bc72bc40 pc=0x12f8aa8 github.com/apache/arrow/go/v16/parquet/file.(*recordReader).ReadRecords(0x41bb5c8000, 0xce) /go/pkg/mod/github.com/apache/arrow/go/v16@v16.1.0/parquet/file/record_reader.go:632 +0x294 fp=0x41bc72bde0 sp=0x41bc72bd40 pc=0x12f8e84 github.com/apache/arrow/go/v16/parquet/pqarrow.(*leafReader).LoadBatch(0x41bb5c8060, 0xce) /go/pkg/mod/github.com/apache/arrow/go/v16@v16.1.0/parquet/pqarrow/column_readers.go:104 +0xd8 fp=0x41bc72be30 sp=0x41bc72bde0 pc=0x1767e48 github.com/apache/arrow/go/v16/parquet/pqarrow.(*listReader).LoadBatch(0x41bc72bee8?, 0x41bc72bf3c?) /go/pkg/mod/github.com/apache/arrow/go/v16@v16.1.0/parquet/pqarrow/column_readers.go:360 +0x2c fp=0x41bc72be50 sp=0x41bc72be30 pc=0x17690fc github.com/apache/arrow/go/v16/parquet/pqarrow.(*ColumnReader).NextBatch(0x41b9013190, 0xce) /go/pkg/mod/github.com/apache/arrow/go/v16@v16.1.0/parquet/pqarrow/file_reader.go:131 +0x34 fp=0x41bc72be70 sp=0x41bc72be50 pc=0x176e9d4 github.com/apache/arrow/go/v16/parquet/pqarrow.(*recordReader).next.func1(0x5, 0x41bc72bf38?) /go/pkg/mod/github.com/apache/arrow/go/v16@v16.1.0/parquet/pqarrow/file_reader.go:655 +0x50 fp=0x41bc72bef0 sp=0x41bc72be70 pc=0x17729a0 github.com/apache/arrow/go/v16/parquet/pqarrow.(*recordReader).next.func2() /go/pkg/mod/github.com/apache/arrow/go/v16@v16.1.0/parquet/pqarrow/file_reader.go:708 +0x100 fp=0x41bc72bfd0 sp=0x41bc72bef0 pc=0x1772850 runtime.goexit({}) /root/.gimme/versions/go1.22.5.linux.arm64/src/runtime/asm_arm64.s:1222 +0x4 fp=0x41bc72bfd0 sp=0x41bc72bfd0 pc=0x4df0a4 created by github.com/apache/arrow/go/v16/parquet/pqarrow.(*recordReader).next in goroutine 253 /go/pkg/mod/github.com/apache/arrow/go/v16@v16.1.0/parquet/pqarrow/file_reader.go:699 +0x2e8 ... ``` It seems that the segmentation fault is happening inside of `(*recordReader).next` so was curious if anyone familiar with this library had some insight on why this was happening. I can share a longer stack trace if that would be helpful. I am also using v16 but saw the same error in v13 as well. Thanks in advance! ### Component(s) Go -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org