tosinva-stripe opened a new issue, #195: URL: https://github.com/apache/arrow-go/issues/195
### Describe the bug, including details regarding any error messages, version, and platform. LargeBinary and LargeString use int64 offsets, however Binary and String types use int32 offsets, this makes them susceptible to slice index out of bounds errors when the column/array is larger than ~2GB ~= 2^31 bytes. To reproduce try deserializing a parquet file that is greater than 2.2 GB. A workaround is to force the go library to deserialize the field/column as LargeBinary instead of Binary: - explicitly store the arrow schema during write. see `store_schema` https://arrow.apache.org/docs/cpp/parquet.html#roundtripping-arrow-types-and-schema - and schema explicitly uses the [large_binary](https://arrow.apache.org/docs/cpp/api/datatype.html#_CPPv412large_binaryv) or large_string type when defining the schema that is used to write the parquet files. Error looks like: ```go panic: runtime error: slice bounds out of range [:-2147483014] goroutine 95 [running]: github.com/apache/arrow/go/v17/arrow/array.(*Binary).Value(...) /go/pkg/mod/github.com/apache/arrow/go/v17@v17.0.0/arrow/array/binary.go:59 github.com/apache/arrow/go/v17/arrow/array.(*Binary).ValueStr(0xc000178d20?, 0xc091402a00?) /go/pkg/mod/github.com/apache/arrow/go/v17@v17.0.0/arrow/array/binary.go:67 +0xfa extractorvalidator/data.BootstrapRecordsFromParquet({0x1de1a40, 0xcc6a9775f0}, 0x0) /.../data/records.go:78 +0x582 main.validationWorker({0x1dccd90, 0x2c31840}, 0x0?, {0x0?}, 0xc0000315e0, 0xc000001de0, 0xc0000fe9c0) /.../command.go:428 +0x125 created by main.RunValidateCmd in goroutine 1 /.../command.go:174 +0xb90 ``` version and platform ``` Arrow Version: github.com/apache/arrow/go/v17 v17.0.0 Platform: Linux 20.04.1-Ubuntu x86_64 x86_64 x86_64 GNU/Linux ``` ### Component(s) Parquet, Other -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org