[
https://issues.apache.org/jira/browse/KAFKA-10650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Viktor Somogyi-Vass updated KAFKA-10650:
----------------------------------------
Description:
The usage of MD5 has been uncovered during testing Kafka for FIPS (Federal
Information Processing Standards) verification.
While MD5 isn't a FIPS incompatibility here as it isn't used for cryptographic
purposes, I spent some time with this as it isn't ideal either. MD5 is a
relatively fast crypto hashing algo but there are much better performing
algorithms for hash tables as it's used in SkimpyOffsetMap.
By applying Murmur3 (that is implemented in Streams) I could achieve a 3x
faster {{put}} operation and the overall segment cleaning sped up by 30% while
preserving the same collision rate (both performed within 0.0015 - 0.007,
mostly with 0.004 median).
The usage of Murmur3 was decided as research paper [1] shows Murmur2 is
relatively a good choice for hash tables. Based on this Since Murmur3 is
available in the project I used that.
[1]
https://www.researchgate.net/publication/235663569_Performance_of_the_most_common_non-cryptographic_hash_functions
Benchmark evidence:
!benchmark-evidence.png!
was:
The usage of MD5 has been uncovered during testing Kafka for FIPS (Federal
Information Processing Standards) verification.
While MD5 isn't a FIPS incompatibility here as it isn't used for cryptographic
purposes, I spent some time with this as it isn't ideal either. MD5 is a
relatively fast crypto hashing algo but there are much better performing
algorithms for hash tables as it's used in SkimpyOffsetMap.
By applying Murmur3 (that is implemented in Streams) I could achieve a 3x
faster {{put}} operation and the overall segment cleaning sped up by 30% while
preserving the same collision rate (both performed within 0.0015 - 0.007,
mostly with 0.004 median).
The usage of Murmur3 was decided as research paper [1] shows Murmur2 is
relatively a good choice for hash tables. Based on this Since Murmur3 is
available in the project I used that.
[1]
https://www.researchgate.net/publication/235663569_Performance_of_the_most_common_non-cryptographic_hash_functions
!benchmark-evidence.png!
> Use Murmur3 hashing instead of MD5 in SkimpyOffsetMap
> -----------------------------------------------------
>
> Key: KAFKA-10650
> URL: https://issues.apache.org/jira/browse/KAFKA-10650
> Project: Kafka
> Issue Type: Improvement
> Components: core
> Reporter: Viktor Somogyi-Vass
> Assignee: Viktor Somogyi-Vass
> Priority: Major
> Attachments: benchmark-evidence.png
>
>
> The usage of MD5 has been uncovered during testing Kafka for FIPS (Federal
> Information Processing Standards) verification.
> While MD5 isn't a FIPS incompatibility here as it isn't used for
> cryptographic purposes, I spent some time with this as it isn't ideal either.
> MD5 is a relatively fast crypto hashing algo but there are much better
> performing algorithms for hash tables as it's used in SkimpyOffsetMap.
> By applying Murmur3 (that is implemented in Streams) I could achieve a 3x
> faster {{put}} operation and the overall segment cleaning sped up by 30%
> while preserving the same collision rate (both performed within 0.0015 -
> 0.007, mostly with 0.004 median).
> The usage of Murmur3 was decided as research paper [1] shows Murmur2 is
> relatively a good choice for hash tables. Based on this Since Murmur3 is
> available in the project I used that.
> [1]
> https://www.researchgate.net/publication/235663569_Performance_of_the_most_common_non-cryptographic_hash_functions
> Benchmark evidence:
> !benchmark-evidence.png!
--
This message was sent by Atlassian Jira
(v8.3.4#803005)