reiades opened a new issue, #43520:
URL: https://github.com/apache/arrow/issues/43520

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   Hello!
   
   I am currently using `github.com/apache/arrow/go/v16/parquet` to read the 
records of a downloaded s3 parquet file (75KB, stored in `bytes.Buffer`). My 
implementation is the following:
   
   ```
   mem := memory.NewCheckedAllocator(memory.DefaultAllocator)
   pf, err := file.NewParquetReader(bytes.NewReader(buf.Bytes()), 
file.WithReadProps(parquet.NewReaderProperties(mem)))
   if err != nil {
        return nil, err
   }
   defer pf.Close()
   reader, err := pqarrow.NewFileReader(pf, 
pqarrow.ArrowReadProperties{Parallel: true, BatchSize: pf.NumRows()}, mem)
   if err != nil {
        return nil, err
   }
   rr, err := reader.GetRecordReader(ctx, nil, nil)
   if err != nil {
        return nil, err
   }
   defer rr.Release()
   rec, err = rr.Read() <---- problem line
   if err != nil && err != io.EOF {
        return nil, err
   }
   if rec == nil {
        return nil, nil
   }
   defer rec.Release()
   
   ... parse the file 
   
   ```
   
   I am reading the same file each time and majority of the reads into `rec` 
are successful. However, on occasion, I get a segmentation fault inside of 
`rr.Read()`.  I have confirmed that the file is successfully downloaded each 
time and that `buf.Bytes()` is the same on successful and failed reads. I have 
also confirmed that I can get the schema from the file on successful and failed 
reads which leads me more to believe something is happening inside the 
`RecordReader`.
   ```
   schema := pf.MetaData().Schema
   log.Info(fmt.Sprintf("Schema:%s", schema)) <--- prints out the right schema 
each time
   ```
   
   Here are some logs from the stack trace that I thought could be helpful for 
debugging. 
   ```
   SIGSEGV: segmentation violation
   PC=0x4cb0c8 m=11 sigcode=1 addr=0x7ffbfdf94013e8
   
   goroutine 150888 gp=0x4006db0a80 m=11 mp=0x4000780808 [runnable]:
   
github.com/apache/arrow/go/v16/parquet/internal/bmi.extractBitsGo(0xffffffffffffffff?,
 0xffffffffffffffff?)
        
/go/pkg/mod/github.com/apache/arrow/go/v16@v16.1.0/parquet/internal/bmi/bmi.go:242
 +0xcc fp=0x41bc72bae0 sp=0x41bc72bae0 pc=0x12818ac
   github.com/apache/arrow/go/v16/parquet/internal/bmi.ExtractBits(...)
        
/go/pkg/mod/github.com/apache/arrow/go/v16@v16.1.0/parquet/internal/bmi/bmi.go:38
   
github.com/apache/arrow/go/v16/parquet/file.defLevelsBatchToBitmap({0x45f221c000?,
 0x1?, 0x1?}, 0x400, {0xbc72bbb8?, 0x41?, 0x0?, 0x874c?}, {0x3b0f7d0, 
0x41bcba5cc0}, ...)
        
/go/pkg/mod/github.com/apache/arrow/go/v16@v16.1.0/parquet/file/level_conversion.go:155
 +0x180 fp=0x41bc72bb70 sp=0x41bc72bae0 pc=0x12f2ad0
   
github.com/apache/arrow/go/v16/parquet/file.defLevelsToBitmapInternal({0x45f221c000,
 0x400, 0x2c000}, {0x1?, 0x0?, 0x0?, 0x1?}, 0x41bc72bcc0, 0x1)
        
/go/pkg/mod/github.com/apache/arrow/go/v16@v16.1.0/parquet/file/level_conversion.go:175
 +0x198 fp=0x41bc72bc40 sp=0x41bc72bb70 pc=0x12f2d68
   github.com/apache/arrow/go/v16/parquet/file.DefLevelsToBitmap(...)
        
/go/pkg/mod/github.com/apache/arrow/go/v16@v16.1.0/parquet/file/level_conversion.go:186
   
github.com/apache/arrow/go/v16/parquet/file.(*recordReader).ReadRecordData(0x41bb5c8000,
 0x11)
        
/go/pkg/mod/github.com/apache/arrow/go/v16@v16.1.0/parquet/file/record_reader.go:545
 +0x218 fp=0x41bc72bd40 sp=0x41bc72bc40 pc=0x12f8aa8
   
github.com/apache/arrow/go/v16/parquet/file.(*recordReader).ReadRecords(0x41bb5c8000,
 0xce)
        
/go/pkg/mod/github.com/apache/arrow/go/v16@v16.1.0/parquet/file/record_reader.go:632
 +0x294 fp=0x41bc72bde0 sp=0x41bc72bd40 pc=0x12f8e84
   
github.com/apache/arrow/go/v16/parquet/pqarrow.(*leafReader).LoadBatch(0x41bb5c8060,
 0xce)
        
/go/pkg/mod/github.com/apache/arrow/go/v16@v16.1.0/parquet/pqarrow/column_readers.go:104
 +0xd8 fp=0x41bc72be30 sp=0x41bc72bde0 pc=0x1767e48
   
github.com/apache/arrow/go/v16/parquet/pqarrow.(*listReader).LoadBatch(0x41bc72bee8?,
 0x41bc72bf3c?)
        
/go/pkg/mod/github.com/apache/arrow/go/v16@v16.1.0/parquet/pqarrow/column_readers.go:360
 +0x2c fp=0x41bc72be50 sp=0x41bc72be30 pc=0x17690fc
   
github.com/apache/arrow/go/v16/parquet/pqarrow.(*ColumnReader).NextBatch(0x41b9013190,
 0xce)
        
/go/pkg/mod/github.com/apache/arrow/go/v16@v16.1.0/parquet/pqarrow/file_reader.go:131
 +0x34 fp=0x41bc72be70 sp=0x41bc72be50 pc=0x176e9d4
   
github.com/apache/arrow/go/v16/parquet/pqarrow.(*recordReader).next.func1(0x5, 
0x41bc72bf38?)
        
/go/pkg/mod/github.com/apache/arrow/go/v16@v16.1.0/parquet/pqarrow/file_reader.go:655
 +0x50 fp=0x41bc72bef0 sp=0x41bc72be70 pc=0x17729a0
   github.com/apache/arrow/go/v16/parquet/pqarrow.(*recordReader).next.func2()
        
/go/pkg/mod/github.com/apache/arrow/go/v16@v16.1.0/parquet/pqarrow/file_reader.go:708
 +0x100 fp=0x41bc72bfd0 sp=0x41bc72bef0 pc=0x1772850
   runtime.goexit({})
        /root/.gimme/versions/go1.22.5.linux.arm64/src/runtime/asm_arm64.s:1222 
+0x4 fp=0x41bc72bfd0 sp=0x41bc72bfd0 pc=0x4df0a4
   created by 
github.com/apache/arrow/go/v16/parquet/pqarrow.(*recordReader).next in 
goroutine 253
        
/go/pkg/mod/github.com/apache/arrow/go/v16@v16.1.0/parquet/pqarrow/file_reader.go:699
 +0x2e8
   ...
   ```
   It seems that the segmentation fault is happening inside of 
`(*recordReader).next` so was curious if anyone familiar with this library had 
some insight on why this was happening.  I can share a longer stack trace if 
that would be helpful. I am also using v16 but saw the same error in v13 as 
well. Thanks in advance!
   
   ### Component(s)
   
   Go


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to