Re: [CODEC] Sign Extension Error in Murmur3

2019-11-11 Thread Claude Warren
https://github.com/apache/commons-codec/pull/27 On Mon, Nov 11, 2019 at 4:25 PM Claude Warren wrote: > I took the approach that I would leave the original code there and add new > methods hash128_x64 and hash32_x86. I also marked the older methods as > deprecated with a note that the implementa

Re: [CODEC] Sign Extension Error in Murmur3

2019-11-11 Thread Claude Warren
I took the approach that I would leave the original code there and add new methods hash128_x64 and hash32_x86. I also marked the older methods as deprecated with a note that the implementation has a sign extension error and to use the new methods for new development and cases where the change does

Re: [CODEC] Sign Extension Error in Murmur3

2019-11-04 Thread Claude Warren
There are a number of issues with the format and potential bugs in the Codec Murmur3 code. ( See spotbugs, PMD, and codestyle reports) The one that tripped me up was the mix of tab/space lines. Anyway, these issues can be fixed in later pull requests. I think is one overarching question and one

Re: [CODEC] Sign Extension Error in Murmur3

2019-11-04 Thread Alex Herbert
> On 4 Nov 2019, at 02:13, sebb wrote: > > On Mon, 4 Nov 2019 at 00:53, Claude Warren > wrote: >> >> I think the way to prove they behave correctly is to test the results >> against the original "C" implementation. With this in mind I proposed the >> changes. >> >>

Re: [CODEC] Sign Extension Error in Murmur3

2019-11-03 Thread sebb
On Mon, 4 Nov 2019 at 00:53, Claude Warren wrote: > > I think the way to prove they behave correctly is to test the results > against the original "C" implementation. With this in mind I proposed the > changes. > > I agree that there should be a way to revert to the old code as there is > code in

Re: [CODEC] Sign Extension Error in Murmur3

2019-11-03 Thread Claude Warren
As a third option as @melloware said in the pull request comments the original implementation came from Apache Hive. The current hashes could be named hash31Hive and hash128Hive On Mon, Nov 4, 2019 at 12:53 AM Claude Warren wrote: > I think the way to prove they behave correctly is to test the

Re: [CODEC] Sign Extension Error in Murmur3

2019-11-03 Thread Claude Warren
I think the way to prove they behave correctly is to test the results against the original "C" implementation. With this in mind I proposed the changes. I agree that there should be a way to revert to the old code as there is code in the wild that depends on the earlier implementation. I know th

Re: [CODEC] Sign Extension Error in Murmur3

2019-11-03 Thread sebb
As I see it, there are two use cases here. 1) Commons Codec code is used exclusively, in which case it is essential that the output does not change for a given seed. 2) Commons Codec is used in conjunction with other implementations, in which case it is essential that Codec is aligned with other

Re: [CODEC] Sign Extension Error in Murmur3

2019-11-03 Thread Alex Herbert
> On 3 Nov 2019, at 21:45, Gary Gregory wrote: > > I feel like I am missing something basic in the assumption of this issue: > there is no such thing as an unsigned int in Java and the ticket talks > about (C?) unsigned ints. Please help me understand how or why we should > care about C vs. Ja

Re: [CODEC] Sign Extension Error in Murmur3

2019-11-03 Thread Gary Gregory
I feel like I am missing something basic in the assumption of this issue: there is no such thing as an unsigned int in Java and the ticket talks about (C?) unsigned ints. Please help me understand how or why we should care about C vs. Java ints. Are we comparing apples to oranges here? Thank you,

[CODEC] Sign Extension Error in Murmur3

2019-11-03 Thread Claude Warren
There is an error in the current Murmur3 code introduced by sign extension errors. This is documented in CODEC-264.[1] I have created a pull request to fix it.[2] While the code changes did not change any of the existing Murmur3 tests, I did add new tests that failed until the changes were appli