asfimport opened a new issue, #230: URL: https://github.com/apache/arrow-java/issues/230
I encountered this bug when I loaded a dataframe stored in the Arrow IPC format. ```java // Java Code from "Apache Arrow Java Cookbook" File file = new File("example.arrow"); try ( BufferAllocator rootAllocator = new RootAllocator(); FileInputStream fileInputStream = new FileInputStream(file); ArrowFileReader reader = new ArrowFileReader(fileInputStream.getChannel(), rootAllocator) ) { System.out.println("Record batches in file: " + reader.getRecordBlocks().size()); for (ArrowBlock arrowBlock : reader.getRecordBlocks()) { reader.loadRecordBatch(arrowBlock); VectorSchemaRoot vectorSchemaRootRecover = reader.getVectorSchemaRoot(); System.out.print(vectorSchemaRootRecover.contentToTSVString()); } } catch (IOException e) { e.printStackTrace(); } ``` Call stack: ``` Exception in thread "main" java.lang.IndexOutOfBoundsException: index: 0, length: 2048 (expected: range(0, 2024)) at org.apache.arrow.memory.ArrowBuf.checkIndex(ArrowBuf.java:701) at org.apache.arrow.memory.ArrowBuf.setBytes(ArrowBuf.java:955) at org.apache.arrow.vector.BaseFixedWidthVector.reAlloc(BaseFixedWidthVector.java:451) at org.apache.arrow.vector.BaseFixedWidthVector.setValueCount(BaseFixedWidthVector.java:732) at org.apache.arrow.vector.VectorSchemaRoot.setRowCount(VectorSchemaRoot.java:240) at org.apache.arrow.vector.VectorLoader.load(VectorLoader.java:86) at org.apache.arrow.vector.ipc.ArrowReader.loadRecordBatch(ArrowReader.java:220) at org.apache.arrow.vector.ipc.ArrowFileReader.loadNextBatch(ArrowFileReader.java:166) at org.apache.arrow.vector.ipc.ArrowFileReader.loadRecordBatch(ArrowFileReader.java:197) ``` This bug can be reproduced by a simple dataframe created by pandas: ```java pd.DataFrame({'a': range(10000)}).to_feather('example.arrow') ``` Pandas compresses the dataframe by default. If the compression is turned off, Java can load the dataframe. Thus, I guess the bounds checking code is buggy when loading compressed file. That dataframe can be loaded in polars, pandas and pyarrow, so it's unlikely to be a pandas bug. **Environment**: Linux and Windows. Apache Arrow Java version: 10.0.0, 9.0.0, 4.0.1. Pandas 1.4.2 using pyarrow 8.0.0 (anaconda3-2022.05) **Reporter**: [Georeth Zhou](https://issues.apache.org/jira/browse/ARROW-18198) <sub>**Note**: *This issue was originally created as [ARROW-18198](https://issues.apache.org/jira/browse/ARROW-18198). Please see the [migration documentation](https://github.com/apache/arrow/issues/14542) for further details.*</sub> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org