zeroshade commented on issue #2: URL: https://github.com/apache/iceberg-cpp/issues/2#issuecomment-2494138383
The biggest drawback to just using the Arrow C++ type system directly is that the mappings aren't perfect for iceberg. Iceberg only has Int32 and Int64 while Arrow has Int 8/16/32/64 *and* Uint 8/16/32/64. The same goes for all of the other types that exist in Arrow but don't exist for Iceberg (such as the `Large*` variants, REE, and so on). Another issue is how Time and Timestamp types are handled: Iceberg defines the unit to be milliseconds while Arrow parameterizes the types. For the most part you can see the logic needed for converting between Iceberg and Arrow type systems [here](https://github.com/apache/iceberg-go/blob/main/table/arrow_utils.go#L291) The differences in the types means that even if you re-use the types from Arrow, you're still going to eventually have to perform a conversion / implement this logic when it comes to reading/writing data and converting it to Arrow. This is why I provided functions to convert an Arrow Schema to Iceberg and vice-versa in the iceberg-go library. Reading data still returns a stream of Arrow record batches, and when I implement writing, it'll accept a stream of Arrow record batches to write. It's not that there's specific issues the Arrow type system can't deal with, it's more that there are significantly more types and flexibility in the Arrow type system than what is available in the Iceberg type system. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org