clee704 commented on PR #49642: URL: https://github.com/apache/arrow/pull/49642#issuecomment-4181298179
Thanks for the review @pitrou! > I would not consider this a critical fix, as this is just working around a bug/limitation in another Parquet implementation. Fair point — I've removed the "Critical Fix" label from the description. It's an interoperability fix rather than a correctness issue in Arrow's own read path. > why not use the newer LZ4_RAW which completely solves the Hadoop compatibility problem? Good question. In our case the codec mapping is in a layer above Arrow (SNPW maps Spark's `lz4` config → `Compression::LZ4_HADOOP`), so switching to `LZ4_RAW` is a separate change there. We plan to do that too. That said, `Lz4HadoopCodec` exists specifically for Hadoop compatibility and is a public codec in Arrow — it should produce output that Hadoop can actually read, regardless of whether callers should prefer `LZ4_RAW` for new files. The current behavior is arguably a bug in its own contract. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
