rustyconover opened a new issue, #422:
URL: https://github.com/apache/arrow-js/issues/422
## Summary
`getDateDay` converts int32 day values to epoch milliseconds via `86400000 *
data[index]`. This overflows `Number.MAX_SAFE_INTEGER` for dates far from
epoch, silently returning incorrect values.
## Root cause
In `visitor/get.mjs`:
```js
const epochDaysToMs = (data, index) => 86400000 * data[index];
const getDateDay = ({ values }, index) => epochDaysToMs(values, index);
```
Date32 stores days as int32, supporting a range of ±2^31 days (~5.8 million
years in each direction). But `86400000 * 2147483647 = 1.855e17`, which exceeds
`Number.MAX_SAFE_INTEGER` (9.007e15).
The precision boundary is `Math.floor(Number.MAX_SAFE_INTEGER / 86400000) =
104,249,991` days ≈ ±285,420 years from epoch. Dates outside roughly 283,400 BC
– 287,400 AD silently return incorrect millisecond values due to floating-point
precision loss.
## Impact
Unlike the timestamp overflow (which throws — see #421), this **silently
returns wrong data**. DuckDB's `DATE` type supports dates from 5,877,642 BC to
5,881,580 AD. DuckDB's `test_all_types()` produces min/max dates well outside
the safe conversion range. Consumers displaying these dates get subtly wrong
output with no error.
Additionally, DuckDB uses date infinity sentinels (`INT32_MAX` and
`-INT32_MAX`) which also overflow when multiplied by 86400000.
## Proposal
Return the raw int32 day count instead of converting to milliseconds:
```js
const getDateDay = ({ values }, index) => values[index];
```
This is lossless, never overflows, and consistent with how `visitInt32`
already returns the raw value. Callers that want a JS Date can convert
explicitly:
```js
new Date(days * 86400000) // fine for dates within ±285k years
```
As with #421, this is a breaking change for code that expects `get()` to
return epoch milliseconds. The same resolution options apply (major version
bump, opt-in flag, etc.).
## Context
We hit this building a DuckDB WASM frontend that renders `test_all_types()`.
Our workaround reads raw int32 values from the underlying `Int32Array`:
```js
if (typeStr.includes("Date32")) {
const chunk = column.data?.[0] ?? column.data;
if (chunk?.values instanceof Int32Array) {
return chunk.values[row - (chunk.offset ?? 0)]; // raw days
}
}
```
This bypasses Arrow's getter entirely. We then format using Howard Hinnant's
civil calendar algorithm, which handles the full int32 day range correctly with
pure arithmetic.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]