ZENOTME commented on issue #34:
URL: https://github.com/apache/iceberg-rust/issues/34#issuecomment-1686397825
I find that the parquet writer will cast the type automatelly, which means
that following code can work:
```
// A writer with schema {a: timestamp with time zone}
let schema = Schema::new(Fields::from(vec![Field::new("a",
DataType::Timestamp(arrow::datatypes::TimeUnit::Microsecond,Some("+08:00".into())),
false)])).into();
let w = op.writer("test").await?;
let mut pw = ParquetWriterBuilder::new(w, schema).build()?;
// We can insert it using i64 array.
let col = Arc::new(Int64Array::from_iter_values(vec![1])) as ArrayRef;
let to_write = RecordBatch::try_from_iter([("a", col)]).unwrap();
pw.write(&to_write).await?;
// We can insert it using f32 array.
let col = Arc::new(Float32Array::from_iter_values(vec![1])) as ArrayRef;
let to_write = RecordBatch::try_from_iter([("a", col)]).unwrap();
pw.write(&to_write).await?;
```
The schema of writer is timestamp with time zone, but we can insert into it
using int64, float array . And [the cast
logic](https://github.com/apache/arrow-rs/blob/23db567d05bc21df56f5f7d08288f209de9fd785/parquet/src/arrow/arrow_writer/mod.rs#L520)
is to cast the physical representation directly rather than [logical
cast](https://docs.rs/arrow/latest/arrow/compute/fn.cast.html). In above
example, the physical representation of timestamp is i64, so it just cast i64,
f32 into i64.
I'm not sure this behaviour will cause potential bug in future.
So I want to discuss:
1. Should we support schema safety check? (Personally I think we should)
2. If we want to support, how strict it should? Should we do the auto cast
sometimes? BTW, if we want to do this, we shouldn't let parquet writer do it.
We should do it using [logical
cast](https://docs.rs/arrow/latest/arrow/compute/fn.cast.html) manually in our
writer.
e.g. the schema of table is following. And the input record is timestamp
without time zone.
```
table {
t: timestamp with time zone
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]