Re: [I] [DISCUSSION] Project Goal [iceberg-cpp]

via GitHub Fri, 22 Nov 2024 08:21:25 -0800


zeroshade commented on issue #2:
URL: https://github.com/apache/iceberg-cpp/issues/2#issuecomment-2494138383


   The biggest drawback to just using the Arrow C++ type system directly is 
that the mappings aren't perfect for iceberg.
   
   Iceberg only has Int32 and Int64 while Arrow has Int 8/16/32/64 *and* Uint 
8/16/32/64. The same goes for all of the other types that exist in Arrow but 
don't exist for Iceberg (such as the `Large*` variants, REE, and so on). 
Another issue is how Time and Timestamp types are handled: Iceberg defines the 
unit to be milliseconds while Arrow parameterizes the types. For the most part 
you can see the logic needed for converting between Iceberg and Arrow type 
systems 
[here](https://github.com/apache/iceberg-go/blob/main/table/arrow_utils.go#L291)
   
   The differences in the types means that even if you re-use the types from 
Arrow, you're still going to eventually have to perform a conversion / 
implement this logic when it comes to reading/writing data and converting it to 
Arrow. This is why I provided functions to convert an Arrow Schema to Iceberg 
and vice-versa in the iceberg-go library. Reading data still returns a stream 
of Arrow record batches, and when I implement writing, it'll accept a stream of 
Arrow record batches to write.
   
   It's not that there's specific issues the Arrow type system can't deal with, 
it's more that there are significantly more types and flexibility in the Arrow 
type system than what is available in the Iceberg type system.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] [DISCUSSION] Project Goal [iceberg-cpp]

Reply via email to