alamb opened a new issue, #46908:
URL: https://github.com/apache/arrow/issues/46908

   ### Describe the enhancement requested
   
   Parquet has added a new type for semi-structured data called `Variant` which 
is defined here:
   * Variant encoding spec: 
https://github.com/apache/parquet-format/blob/main/VariantEncoding.md
   * Variant shredding spec: 
https://github.com/apache/parquet-format/blob/main/VariantShredding.md
   
   As it is common for engines to read data from Parquet into Arrow for in 
memory processing it is useful to have support for Variant in Arrow. 
@CurtHagenlocher  proposes adding native Variant support in the Arrow format 
itself here:
   - https://github.com/apache/arrow/issues/42069
   
   An alternate approach is to add a [Canonical Extension 
Type](https://arrow.apache.org/docs/format/CanonicalExtensions.html#canonical-extension-types)
 
   
   @zeroshade wrote up a proposal
   - Mailing List Discussion: 
https://lists.apache.org/thread/w06cxdojjcmry4m9vb0bo7owd1jsbtz5
   - Google Document: 
https://docs.google.com/document/d/1pw0AWoMQY3SjD7R4LgbPvMjG_xSCtXp3rZHkVp9jpZ4/edit?usp=sharing
   
   And implemented an implementation in Go
   - 
https://github.com/apache/arrow-go/commit/5240503993cc0aa47554b932c341e4940ce42348
   
   This ticket tracks the idea of adding Variant as an official extension type
   
   See also @neilechao 's PR to add variant read support to parquet
   - https://github.com/apache/arrow/issues/45937
   
   ### Component(s)
   
   Format


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to