We're updating the issue view to help you get more done. 

hash is inconsistent with = for some BigInteger and floating point values

Description

hash is documented to be consistent with = but Util/hasheq returns different hash values for some pairs of numbers that are =

1 2 3 4 5 6 7 8 9 user> (apply = [-1 -1N (biginteger -1)]) true user> (map hash [-1 -1N (biginteger -1)]) (0 0 -1) user> (apply = [(Float. 1e9) (Double. 1e9)]) true user> (map hash [(Float. 1e9) (Double. 1e9)]) (1315859240 1104006501)

Consequences include incorrect behavior for hash-maps containing keys that are =, but have different hash values:

1 2 3 4 5 6 7 ;; Incorrect return value with multiple keys = to each other user> (assoc (hash-map -1N :should-be-replaced) (biginteger -1) :new-val) {-1N :should-be-replaced, -1 :new-val} ;; array-map gives correct value, since it uses =, not hash user> (assoc (array-map -1N :should-be-replaced) (biginteger -1) :new-val) {-1N :new-val}

Patch: clj-1036-hasheq-for-biginteger-patch-v4.txt

Approach:

The only BigInteger values that have inconsistent hash values should be those in the range of a long. BigInteger and BigInt values outside the range of a long already both return BigInteger.hashCode().

All integer values will return consistent hash codes if we add a new case to Numbers.hasheq(Number) for BigIntegers that lie in the range of a long, returning the same hash that such a long value does.

For floating point values, the patch makes their hashes consistent by converting floats to doubles and then hashing.

One alternate approach would be to convert all double values to floats and hash float values only. However, this throws away half of the bits of the double value before hashing, leading to many undesirable hash collisions between different double values.

Environment

None

Status

Assignee

Unassigned

Reporter

Paul Stadig

Labels

None

Approval

Triaged

Patch

Code and Test

Affects versions

Release 1.5
Release 1.4

Priority

Major