visr opened a new issue, #559:
URL: https://github.com/apache/arrow-julia/issues/559

   #172 added a warning. I think it is too expensive, and it may be better to 
just document the behavior instead.
   
   I am reading a 7760500 row table written by pandas, which defaults to 
nanosecond resolution, and want to convert it to DateTime, I don't need sub-ms 
precision. The conversion worked out of the box, but took 75 seconds. Profiling 
showed that almost all time was in `warntimestamp` generating the log message. 
Without the log message it takes 0.05 seconds.
   
   This shows it in a benchmark:
   
   ```jl
   using Chairmarks
   
   # alternative to convert that doesn't have warntimestamp
   function to_datetime(x::Arrow.Timestamp{U, nothing})::DateTime where {U}
       ns_since_epoch = Arrow.periodtype(U)(x.x)
       ms_since_epoch = Dates.toms(ns_since_epoch)
       ut_instant = Dates.UTM(ms_since_epoch + Arrow.UNIX_EPOCH_DATETIME)
       return DateTime(ut_instant)
   end
   
   const ts = Arrow.Timestamp{Arrow.Flatbuf.TimeUnit.NANOSECOND, 
nothing}(1764288000000000000)
   @b convert(DateTime, ts)  # 6.525 μs (119 allocs: 6.719 KiB)
   @b to_datetime(ts)  # 1.332 ns
   ```
   
   I now avoid this with `convert = false` and using the `to_datetime` function 
above, but I think more people will run into this performance pitfall.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to