ccciudatu opened a new issue, #43469:
URL: https://github.com/apache/arrow/issues/43469

   ### Describe the enhancement requested
   
   Application code is currently required to choose upfront between handling 
compressed vs. uncompressed data by specifying one of the two (mutually 
exclusive) `CompressionCodec.Factory` implementations: 
`NoCompressionCodec.Factory` and `CommonsCompressionCodecFactory`.
   
   While this is totally acceptable (or even required) for the write path (e.g. 
`ArrowWriter`) it makes it really tedious to support compression on the read 
path, as it's not reasonable to choose between handling 
_uncompressed-data-only_ and _compressed-data-only_ when writing (e.g.) a 
client app for Arrow Flight.
   As already reported in https://github.com/apache/arrow/issues/41457, the 
Java FlightClient currently fails with the following error when trying to 
decode a compressed stream:
   
   ```
   java.lang.IllegalArgumentException: Please add arrow-compression module to 
use CommonsCompressionCodecFactory for LZ4_FRAME
        at 
org.apache.arrow.vector.compression.NoCompressionCodec$Factory.createCodec(NoCompressionCodec.java:63)
        at 
org.apache.arrow.vector.compression.CompressionCodec$Factory$1.createCodec(CompressionCodec.java:91)
        at org.apache.arrow.vector.VectorLoader.load(VectorLoader.java:79)
        at org.apache.arrow.flight.FlightStream.next(FlightStream.java:275)
   ```
   The `FlightStream` class does not explicitly pass a compression codec 
factory when creating a `VectorLoader`, which then uses the default 
`NoCompressionCodec.Factory`. Changing the default to 
`CommonsCompressionCodecFactory` is not an option because:
   
   1. `CommonsCompressionCodecFactory` does not support uncompressed data
   2. `arrow-compression` is not a dependency for `arrow-vector`
   
   Instead of challenging these two design decisions, the proposed solution 
(upcoming PR) is to make the default `CompressionCodec.Factory` use a 
`ServiceLoader` to gather all the available implementations and combine them to 
support as many `CodecType`s as possible, falling back to the `NO_COMPRESSION` 
codec type (i.e. the same default as today).
   
   The arrow-compression module would then act as a service provider, so that 
whenever it's present in the module- (or class-) path, it will transparently 
fill in the gaps of the default factory.
   As a side note, this is in fact the literal meaning of the above error 
message (_"Please add arrow-compression module to use 
CommonsCompressionCodecFactory"_), so we can assume this was the original 
intention.
   
   
   ### Component(s)
   
   FlightRPC, Java


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to