scovich commented on code in PR #8350:
URL: https://github.com/apache/arrow-rs/pull/8350#discussion_r2357600500
##########
arrow-schema/src/field.rs:
##########
@@ -391,6 +398,49 @@ impl Field {
/// ```
pub fn with_data_type(mut self, data_type: DataType) -> Self {
self.set_data_type(data_type);
+ self.without_extension_type()
Review Comment:
This change seems really dangerous... it will silently strip away the
extension type information, which can propagate into e.g. the parquet file the
column eventually gets written to, which could be very hard to fix later on
when downstream readers start noticing the extension type info is unexpectedly
missing.
Seems better to make this fallible, and blow up if there is an extension
type for which the new data type is incompatible?
We could potentially still keep the original (infallible) version and panic
if the new type is invalid. Doing that means people upgrading to arrow-57 won't
necessarily realize there's a potential problem until their code starts
crashing, which I don't love, but I suspect some people would rather deal with
a potential panic than have to rewrite all their `with_data_type` calls?
Alternatively -- maybe we leave the panicky infallible version in place, but
deprecated with an appropriate warning? But people with automated builds won't
necessarily _see_ the warning, especially if they already have lots of warnings
in their code.
If somebody _actually_ wanted to clear the extension type, it seems much
better to require them to clearly state their intent by e.g.
`field.without_extension_type().with_data_type(...)`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]