farkmarnum opened a new issue, #83:
URL: https://github.com/apache/arrow-js/issues/83

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   While constructing a `Vector` of `Timestamp` values, any pre-epoch values 
end up offset by approximately `2^32 * 10^(3n)`, where `n` depends on the 
precision. 
   
   <details>
   <summary>
   I've written a script for minimal repro using the latest version of the 
apache-arrow NPM package (12.0.0).
   </summary>
   
   ```javascript
   // test.mjs
   import {
     Date_,
     makeBuilder,
     Timestamp,
     TimeUnit,
     DateUnit,
   } from "apache-arrow";
   
   /**
    * @param {Date} value
    * @param {TimeUnit} precision
    * @returns {number}
    */
   const testTimestamp = (value, precision) => {
     const columnBuilder = makeBuilder({ type: new Timestamp(precision) });
     columnBuilder.append(value);
     const vec = columnBuilder.finish().toVector();
     const valueInVec = Array.from(vec)[0];
   
     return +value - +valueInVec;
   };
   
   console.log("Testing TIMESTAMP\n");
   
   [
     ["2000-01-01", TimeUnit.NANOSECOND],
     ["2000-01-01", TimeUnit.MICROSECOND],
     ["2000-01-01", TimeUnit.MILLISECOND],
     ["2000-01-01", TimeUnit.SECOND],
     [],
     ["1900-01-01", TimeUnit.NANOSECOND],
     ["1900-01-01", TimeUnit.MICROSECOND],
     ["1900-01-01", TimeUnit.MILLISECOND],
     ["1900-01-01", TimeUnit.SECOND],
     [],
     ["1969-12-31 23:59:59Z", TimeUnit.NANOSECOND],
     ["1969-01-01", TimeUnit.NANOSECOND],
     ["1900-01-01", TimeUnit.NANOSECOND],
     ["1800-01-01", TimeUnit.NANOSECOND],
     ["1700-01-01", TimeUnit.NANOSECOND],
   ].forEach(([dateStr, precision]) => {
     if (!dateStr) {
       console.log();
       return;
     }
     const outcome = testTimestamp(new Date(dateStr), precision);
     const label = `${dateStr} w/ ${TimeUnit[precision]}`;
     console.log(`${label.padEnd(40)} => ${outcome}`);
   });
   
   console.log("\n\nTesting DATE\n");
   
   /**
    * @param {Date} value
    * @returns {Date}
    */
   const testDate = (value, precision) => {
     const columnBuilder = makeBuilder({ type: new Date_(DateUnit.DAY) });
     columnBuilder.append(value);
     const vec = columnBuilder.finish().toVector();
     const valueInVec = Array.from(vec)[0];
   
     return valueInVec;
   };
   
   [
     ["1969-12-15 01:00:00Z", "rounded up"],
     ["1969-12-31 01:00:00Z", "rounded up"],
     ["1970-01-01 00:00:00Z", "rounded down"],
     ["1970-01-01 23:00:00Z", "rounded down"],
     ["1970-01-15 01:00:00Z", "rounded down"],
   ].forEach(([dateStr, label]) => {
     const date = new Date(dateStr);
     const outcome = testDate(date);
     console.log(`${date.toISOString()} => ${outcome.toISOString()} 
(${label})`);
   });
   ```
   </details>
   
   Here's the output:
   ```
   Testing TIMESTAMP
   
   2000-01-01 w/ NANOSECOND                 => 0
   2000-01-01 w/ MICROSECOND                => 0
   2000-01-01 w/ MILLISECOND                => 0
   2000-01-01 w/ SECOND                     => 0
   
   1900-01-01 w/ NANOSECOND                 => -4294.96728515625
   1900-01-01 w/ MICROSECOND                => -4294967.2958984375
   1900-01-01 w/ MILLISECOND                => -4294967296
   1900-01-01 w/ SECOND                     => -4294967296000
   
   1969-12-31 23:59:59Z w/ NANOSECOND       => -4294.967296
   1969-01-01 w/ NANOSECOND                 => -4294.967296600342
   1900-01-01 w/ NANOSECOND                 => -4294.96728515625
   1800-01-01 w/ NANOSECOND                 => -4294.9677734375
   1700-01-01 w/ NANOSECOND                 => -4294.966796875
   
   
   Testing DATE
   
   1969-12-15T01:00:00.000Z => 1969-12-16T00:00:00.000Z (rounded up)
   1969-12-31T01:00:00.000Z => 1970-01-01T00:00:00.000Z (rounded up)
   1970-01-01T00:00:00.000Z => 1970-01-01T00:00:00.000Z (rounded down)
   1970-01-01T23:00:00.000Z => 1970-01-01T00:00:00.000Z (rounded down)
   1970-01-15T01:00:00.000Z => 1970-01-15T00:00:00.000Z (rounded down)
   ```
   
   As you can see, for the `Timestamp` type, values post-epoch pass through 
unscathed, but values pre-epoch end up off by a fixed increment of:
   - approximately `2^32 / 1,000,000` for `NANOSECOND` precision
   - approximately `2^32 / 1,000` for `MICROSECOND` precision
   - `2^32` for `MILLISECOND` precision
   - `2^32 * 1,000` for `SECOND` precision
   
   Note: I say "fixed", but the amount actually varies slightly when it's a 
float (for `NANOSECOND` and `MICROSECOND` precisions), as you can see in the 
last block of tests for `Timestamp` -- it varies a bit, seemingly varying more 
as the value gets further from the epoch.
   
   It seems like the Arrow `Timestamp` type is a 64-bit int of {precision} 
since the epoch, but is represented as two 32-bit ints in JS. Something is 
getting messed up for pre-epoch timestamps, which have a negative number for 
their internal representation. I'm guessing it has to do with 32bit arithmetic 
and/or float precision issues.
   
   Additionally, for the `Date_` type, values post-epoch seem to "round down" 
to the nearest day properly, but values pre-epoch seem to "round up" to the 
next day. I'm guessing that this is a related issue.
   
   ### Component(s)
   
   JavaScript


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to