ava6969 opened a new issue, #43184: URL: https://github.com/apache/arrow/issues/43184
### Describe the bug, including details regarding any error messages, version, and platform. To create a well-documented Git issue, follow these steps: 1. **Title**: Clearly and concisely summarize the issue. 2. **Description**: Provide a detailed description of the problem. 3. **Steps to Reproduce**: List all the steps required to reproduce the issue. 4. **Expected Behavior**: Describe what you expected to happen. 5. **Actual Behavior**: Describe what actually happened. 6. **Code Snippets**: Include relevant code snippets. 7. **Error Messages**: Include any error messages or logs. 8. **Environment Details**: Provide information about the environment where the issue occurs. 9. **Additional Information**: Any other relevant details. Here’s an example based on your situation: --- ### Title: Inconsistent IPC Buffer Serialization Between Python and JavaScript ### Description: I'm experiencing an issue with IPC buffer serialization in my C++ application. The buffer serialized in C++ is deserialized correctly in Python but fails in JavaScript with a metadata byte mismatch error. ### Steps to Reproduce: 1. **C++ Code**: ```cpp arrow::Result<std::shared_ptr<arrow::Buffer>> DataFrame::toBinary(std::vector<std::string> columns, std::optional<std::string> const &index, std::unordered_map<std::string, std::string> const& metadata) const { columns = columns.empty() ? this->columnNames() : columns; std::shared_ptr<arrow::RecordBatch> array = m_array; if (index) { array = array->AddColumn(array->num_columns(), arrow::field(*index, m_index->type()), m_index).MoveValueUnsafe(); } std::shared_ptr<arrow::io::BufferOutputStream> output_stream; ARROW_ASSIGN_OR_RAISE(output_stream, arrow::io::BufferOutputStream::Create()); // Create IPC writer std::shared_ptr<arrow::ipc::RecordBatchWriter> writer; ARROW_ASSIGN_OR_RAISE(writer, arrow::ipc::MakeStreamWriter(output_stream.get(), array->schema())); // Write the RecordBatch ARROW_RETURN_NOT_OK(writer->WriteRecordBatch(*array, std::make_shared<arrow::KeyValueMetadata>(metadata))); // Finalize the writer ARROW_RETURN_NOT_OK(writer->Close()); // Retrieve the buffer std::shared_ptr<arrow::Buffer> buffer; ARROW_ASSIGN_OR_RAISE(buffer, output_stream->Finish()); return buffer; } ``` 2. **Python Code**: ```python import pyarrow as pa import pyarrow.ipc as ipc buffer = pa.BufferReader(binary_data) reader = ipc.open_stream(buffer) table = reader.read_all() data_frame = table.to_pandas() print(data_frame.columns) print(data_frame) ``` 3. **JavaScript Code**: ```javascript const { tableFromIPC } = require('apache-arrow'); const table = tableFromIPC(binary_data); ``` ### Expected Behavior: The IPC buffer serialized in C++ should be deserialized correctly in both Python and JavaScript. ### Actual Behavior: - **Python**: Deserialization works correctly, and the DataFrame is printed as expected. ``` Index(['o', 'h', 'l', 'c', 'v', 'vw', 'n', 'sma<period=30>(c)|sma#00000', 'sma<period=100>(c)|sma#00001', 't'], dtype='object') ``` - **JavaScript**: Deserialization fails with the following error: ``` Error fetching market_data data: Error: Expected to read 131072 metadata bytes, but only read 120532. ``` ### Error Messages: - **JavaScript**: ``` Error fetching market_data data: Error: Expected to read 131072 metadata bytes, but only read 120532. readMetadata message.mjs:99 next message.mjs:48 readMessage message.mjs:57 _readNextMessageAndValidate reader.mjs:321 next reader.mjs:295 readAll reader.mjs:156 tableFromIPC serialization.mjs:29 transformModelData MarketDataProvider.tsx:38 fetchData GenericProvider.tsx:35 ``` ### Environment Details: - **C++**: Arrow version 14.0.1 - **Python**: Arrow version 5 - **JavaScript**: Arrow version 14.0.1, Node.js 14 ### Additional Information: - The table schema includes the following columns: ['o', 'h', 'l', 'c', 'v', 'vw', 'n', 'sma<period=30>(c)|sma#00000', 'sma<period=100>(c)|sma#00001', 't']. ### Possible Causes: - There might be an issue with the IPC writer configuration in the C++ code. - There could be a discrepancy in how the Arrow versions handle IPC streams between Python and JavaScript. ### Steps Taken to Troubleshoot: - Verified the serialized data in Python to confirm correct deserialization. - Checked the metadata and buffer sizes in both C++ and Python. - Ensured that the same Arrow version is used across C++, Python, and JavaScript environments. Please help identify the root cause of this issue and suggest a solution. ### Component(s) C++, JavaScript, Python -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org