SMHasher/Murmurhash author here - I don't see anything fundamentally wrong with this hash function, it uses basically the same operations as the Murmur family (and a lot of other hashes at this point).
The handling of the "tail" of the key (the last < 32 bytes) is slightly odd (the "if (l & 1) { mix 1 byte of key }" happens before 8-byte chunks and 2-byte chunks), but if it passes SMHasher it should be fine for general use.
Not OP, but I suppose there might be a difference between blindly trying to pass a test by tweaking numbers, or a more principled approach to figuring out how to introduce better randomness.
OTOH, I would expect that tests for hashing and random number generation in particular would be naturally resistant to overfitting, due to the nature of the problem. But I'm not an expert at this topic, so I'd love to hear someone else's take on that
I suppose that makes sense: the more you can say about a source of randomness, the less random it probably is, which makes the problem almost self-defeating
Hi, don't know if you'll see this but the tests in SMHasher are designed to break hash functions by generating low-entropy key sets that try to trigger weak bit-mixing paths in the hash function.
When a hash function has weak mixing they tend to fail catastrophically on one of those key sets, so you can think of SMHasher as more like "attempts to test the hash function to the point of failure" instead of a "target" as described in Goodhart's Law.
While the benefit of processing chunks of 8 bytes is obvious, what's the purpose of grouping those into macrogroups of 4? Does it trigger any implicit parallelism I failed to spot? Or is it just to end this phase with the 4 h[] having had the same amount of entropy, and thus starting the next one with h[0]?
> The way it loads the 8 bytes is also important. The correct way is to load via shift+or
> This is free of any UB, works on any alignment and on any machine regardless of it's endianness. It's also fast, gcc and clang recognize this pattern and optimize it into a single mov instruction on x86 targets.
Is a single MOV instruction still fast when the 8 bytes begin on an odd address?
> Is a single MOV instruction still fast when the 8 bytes begin on an odd address?
On x86, yes. There is no performance penalty for misaligned loads, except when the misaligned load also happens to straddle a cache line boundary, in which case it is slower, but only marginally so.
like mentioned x86/64 is quite generous with non-aligned access, yet on architectures that require aligned loads, they will be aligned (all lowest bits being zero at the start), so it will continue being aligned with each 8 byte load.
Why? If you are hashing names, for example, or keywords, they can begin at whatever position inside the buffer.
The case where you allocate a buffer, populate it with random contents and hash the whole of it is an artificial setup for benchmarking, but far from real use cases.
i was about to answer pretty much as 'rerdavies' did. If you have such requirements the specific architectures would need wrappers to copy the data, or do pref/post fix up (ala crc32 & avx implementations on non perfect boundary)
There are few reasons to use a CRC64. Your purposes would probably be just as well-served with a good CRC32.
CRCs allow you to precisely tune the type of error detection guarantees you have to the application. You just pick the combination of hamming distance and polynomial size that you like best from the CRC Zoo:
A hash function is essentially just a statistical version with the avalanche property that works acceptably well the majority of the time instead of a hard guarantee.
With CRC, you have algebraic guarantees on which kind of errors it detects (and which kind doesn't), while with hash functions collisions are essetially random (unless you have some clever insight on how it works internally). Which one is best, depends on the context.
> It’s the fastest hash function we know of, and we have benchmarked all the ones we could find. On modern Intel x64 CPUs, it hashes 16 bytes per cycle single-threaded. This means in cache it can hash at a rate of 64 gigabytes per second on a 4.2gHz machine. Out of cache, it hashes at whatever speed your main memory bus can provide to a single core, since that is usually the limiting factor on modern x64 CPUs.
> It has also now been tuned to be the fastest hash on small inputs, too. Despite the fact that it is a full 128-bit hash,
it still outperforms “fast” 64-bit hashes across all input sizes.
In particular, rapidhash is three times as fast for small inputs and is portable (unlike meow_hash). Plus meow fails one of the SMHasher quality tests.
I wouldn't consider a hash function that relies on 64-bit multiplication truly portable. Its performance will suffer considerably on a 32-bit system. Note that this applies to both Rapidhash and ChibiHash.
Well, even without taking into account that 32-bit systems are increasingly becoming extinct, it's less catastrophic than one might think. You get four muls instead of one and then a little carry propagation; I guess less if you have 32x32 -> 64 muls available? (I haven't thought deeply about it.) Certainly anything that relies on AES-NI would be much worse. But I guess your best shot in Murmur3, then.
> To our surprise, we found a lack of published, well-optimized, large-data hash functions.
Murmur was released in 2008. Murmur3 in 2011. Release dates on higher quality functions are not easy to find, but I am sure that there were more around.
This type of thing is why I take Casey's claims with a huge dose of salt.
Meow hash may very well be fast on large data, but it completely fails the "small code" criteria, which is one of the clearly stated design goal of Chibi.
Generally speaking, the "code as small as possible" is IMO a design constraint that is very often under-estimated or even ignored for these type of algorithms, including the crypto hard ones.
I personally find the "code fits on half a page" metric very important: beyond satisfying some very personal sense of aesthetics, it buys you quite a lot of nice properties:
- fits in head
- inlines like a dream
- easy to integrate
- easy to do header only
- easy to security audit
- pretty hard to introduce bugs
- nothing up my sleeve numbers pretty much obvious
For these reasons, I used to love the TEA (tiny encryption algorithm) [1] before cryptanalysis unfortunately showed it to be weak. Its successors (XXTEA, etc...) are already almost too big for my taste.
The speck cipher [2] is quite nice in that regard, in spite of its rather untrustworthy provenance.
SMHasher/Murmurhash author here - I don't see anything fundamentally wrong with this hash function, it uses basically the same operations as the Murmur family (and a lot of other hashes at this point).
The handling of the "tail" of the key (the last < 32 bytes) is slightly odd (the "if (l & 1) { mix 1 byte of key }" happens before 8-byte chunks and 2-byte chunks), but if it passes SMHasher it should be fine for general use.
> if it passes SMHasher it should be fine for general use.
Author of the post writes:
> I kept making changes until the tests passed.
If SMHasher has become a target, shouldn't it be considered a bad measure now? [1]
[1]: https://en.wikipedia.org/wiki/Goodhart%27s_law
Hash tests have always been targets. What else are you supposed to do for non-cryptographic hashes?
Not OP, but I suppose there might be a difference between blindly trying to pass a test by tweaking numbers, or a more principled approach to figuring out how to introduce better randomness.
OTOH, I would expect that tests for hashing and random number generation in particular would be naturally resistant to overfitting, due to the nature of the problem. But I'm not an expert at this topic, so I'd love to hear someone else's take on that
Maybe, but from what I've seen you run out of what theory can say about your non-crypto hash function pretty quickly.
I suppose that makes sense: the more you can say about a source of randomness, the less random it probably is, which makes the problem almost self-defeating
If this is the case, improvements come from tuning constants and calculation and tests are more valuable.
Hi, don't know if you'll see this but the tests in SMHasher are designed to break hash functions by generating low-entropy key sets that try to trigger weak bit-mixing paths in the hash function.
When a hash function has weak mixing they tend to fail catastrophically on one of those key sets, so you can think of SMHasher as more like "attempts to test the hash function to the point of failure" instead of a "target" as described in Goodhart's Law.
Might also try running it through Smhasher3: https://gitlab.com/fwojcik/smhasher3/-/blob/main/results/REA...
Also here is a hash function I wrote a while ago now: https://github.com/Keith-Cancel/k-hashv/tree/main
While the benefit of processing chunks of 8 bytes is obvious, what's the purpose of grouping those into macrogroups of 4? Does it trigger any implicit parallelism I failed to spot? Or is it just to end this phase with the 4 h[] having had the same amount of entropy, and thus starting the next one with h[0]?
> The way it loads the 8 bytes is also important. The correct way is to load via shift+or > This is free of any UB, works on any alignment and on any machine regardless of it's endianness. It's also fast, gcc and clang recognize this pattern and optimize it into a single mov instruction on x86 targets.
Is a single MOV instruction still fast when the 8 bytes begin on an odd address?
> Is a single MOV instruction still fast when the 8 bytes begin on an odd address?
On x86, yes. There is no performance penalty for misaligned loads, except when the misaligned load also happens to straddle a cache line boundary, in which case it is slower, but only marginally so.
like mentioned x86/64 is quite generous with non-aligned access, yet on architectures that require aligned loads, they will be aligned (all lowest bits being zero at the start), so it will continue being aligned with each 8 byte load.
> they will be aligned (all lowest bits being zero at the start
I don't think so. At the start k will be equal to keyIn, which can point to any arbitrary memory location.
But in practice, will always point to the start of a buffer allocated from the heap which will be aligned.
Why? If you are hashing names, for example, or keywords, they can begin at whatever position inside the buffer.
The case where you allocate a buffer, populate it with random contents and hash the whole of it is an artificial setup for benchmarking, but far from real use cases.
i was about to answer pretty much as 'rerdavies' did. If you have such requirements the specific architectures would need wrappers to copy the data, or do pref/post fix up (ala crc32 & avx implementations on non perfect boundary)
XXH3 omitted from benchmark?
Someone just ported this implementation to Rust. :)
[1] https://github.com/thevilledev/ChibiHash-rs
Happy to help! There’s also a Go implementation available by someone: https://github.com/rainrambler/ChibiHashGo
How does it compare to CRC64, for the purpose of detecting errors?
There are few reasons to use a CRC64. Your purposes would probably be just as well-served with a good CRC32.
CRCs allow you to precisely tune the type of error detection guarantees you have to the application. You just pick the combination of hamming distance and polynomial size that you like best from the CRC Zoo:
https://users.ece.cmu.edu/~koopman/crc/index.html
A hash function is essentially just a statistical version with the avalanche property that works acceptably well the majority of the time instead of a hard guarantee.
With CRC, you have algebraic guarantees on which kind of errors it detects (and which kind doesn't), while with hash functions collisions are essetially random (unless you have some clever insight on how it works internally). Which one is best, depends on the context.
See also: “Meow hash” by the legendary Casey Muratori, a fast non-cryptographic hash function: https://github.com/cmuratori/meow_hash
> It’s the fastest hash function we know of, and we have benchmarked all the ones we could find. On modern Intel x64 CPUs, it hashes 16 bytes per cycle single-threaded. This means in cache it can hash at a rate of 64 gigabytes per second on a 4.2gHz machine. Out of cache, it hashes at whatever speed your main memory bus can provide to a single core, since that is usually the limiting factor on modern x64 CPUs.
> It has also now been tuned to be the fastest hash on small inputs, too. Despite the fact that it is a full 128-bit hash, it still outperforms “fast” 64-bit hashes across all input sizes.
— https://archive.is/CQOVm (originally https://mollyrocket.com/meowhash)
Discussion on HN: https://news.ycombinator.com/item?id=29038813 & https://news.ycombinator.com/item?id=18262627
Not sure why people keep claiming it's the “fastest” when it's far down the list on the SMHasher benchmark list:
https://rurban.github.io/smhasher/doc/table.html
In particular, rapidhash is three times as fast for small inputs and is portable (unlike meow_hash). Plus meow fails one of the SMHasher quality tests.
Right, Meow is not the best when it comes to small inputs.
It shines for large inputs (it’s far up the SMHasher list you linked in this case).
It offers a good compromise; it’s not bad (good, even) for both large and small inputs.
Granted, rapidhash is quite good on both fronts, too.
Well, it's not even the best AES-NI hash on the list (for large nor small inputs), and it's not passing the quality tests, so why bother?
I wouldn't consider a hash function that relies on 64-bit multiplication truly portable. Its performance will suffer considerably on a 32-bit system. Note that this applies to both Rapidhash and ChibiHash.
Well, even without taking into account that 32-bit systems are increasingly becoming extinct, it's less catastrophic than one might think. You get four muls instead of one and then a little carry propagation; I guess less if you have 32x32 -> 64 muls available? (I haven't thought deeply about it.) Certainly anything that relies on AES-NI would be much worse. But I guess your best shot in Murmur3, then.
From the archive link (blog post published 2018):
> To our surprise, we found a lack of published, well-optimized, large-data hash functions.
Murmur was released in 2008. Murmur3 in 2011. Release dates on higher quality functions are not easy to find, but I am sure that there were more around.
This type of thing is why I take Casey's claims with a huge dose of salt.
> This type of thing is why I take Casey's claims with a huge dose of salt.
My impression in general is that Casey is a fairly decent engineer who has somehow been elevated into godhood by a subset of users.
> See also: “Meow hash”
Meow hash may very well be fast on large data, but it completely fails the "small code" criteria, which is one of the clearly stated design goal of Chibi.
Generally speaking, the "code as small as possible" is IMO a design constraint that is very often under-estimated or even ignored for these type of algorithms, including the crypto hard ones.
I personally find the "code fits on half a page" metric very important: beyond satisfying some very personal sense of aesthetics, it buys you quite a lot of nice properties:
For these reasons, I used to love the TEA (tiny encryption algorithm) [1] before cryptanalysis unfortunately showed it to be weak. Its successors (XXTEA, etc...) are already almost too big for my taste.The speck cipher [2] is quite nice in that regard, in spite of its rather untrustworthy provenance.
[1] https://en.wikipedia.org/wiki/Tiny_Encryption_Algorithm
[2] https://en.wikipedia.org/wiki/Speck_(cipher)
[dead]