martin-traverse opened a new issue, #731:
URL: https://github.com/apache/arrow-java/issues/731

   ### Describe the enhancement requested
   
   I'd like to add dictionary encoded values to the Avro adapter and get them 
working for a full round trip with both schema and data. I'm doing some 
(unrelated) work on dictionary encoded values atm so it's easy for me to work 
on this at the same time.
   
   The way I am thinking, dictionary encoding will be supported for string 
values only, encoded as Avro enums. For write operations the entire dictionary 
will need to be specified up-front - that is fine for single batch, once we add 
multi-batch there will be limitations on streams with dictionary updates. When 
reading, the index type should be the smallest signed int type that will hold 
all the values. Avro has no concept of ordering, so reading will always create 
unordered dictionaries and ordering will be lost in round trip. I think this 
approach is correct - if anyone has a different opinion please do shout!
   
   This will part 3 in the Avro adapter series, following #615 and #698, so 
file-level capabilities will be part 4, hope that's ok. PR to follow in a few 
days for review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to