Uploaded image for project: 'Clojure'
  1. CLJ-1036

hash is inconsistent with = for some BigInteger and floating point values

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Declined
    • Affects versions: Release 1.5, Release 1.4
    • Fix versions: None
    • Labels:
      None
    • Approval:
      Triaged
    • Patch:
      Code and Test

      Description

      hash is documented to be consistent with = but Util/hasheq returns different hash values for some pairs of numbers that are =

      user> (apply = [-1 -1N (biginteger -1)])
      true
      user> (map hash [-1 -1N (biginteger -1)])
      (0 0 -1)
      
      user> (apply = [(Float. 1e9) (Double. 1e9)])
      true
      user> (map hash [(Float. 1e9) (Double. 1e9)])
      (1315859240 1104006501)
      

      Consequences include incorrect behavior for hash-maps containing keys that are =, but have different hash values:

      ;; Incorrect return value with multiple keys = to each other
      user> (assoc (hash-map -1N :should-be-replaced) (biginteger -1) :new-val)
      {-1N :should-be-replaced, -1 :new-val}
      
      ;; array-map gives correct value, since it uses =, not hash
      user> (assoc (array-map -1N :should-be-replaced) (biginteger -1) :new-val)
      {-1N :new-val}
      

      Patch: clj-1036-hasheq-for-biginteger-patch-v4.txt

      Approach:

      The only BigInteger values that have inconsistent hash values should be those in the range of a long. BigInteger and BigInt values outside the range of a long already both return BigInteger.hashCode().

      All integer values will return consistent hash codes if we add a new case to Numbers.hasheq(Number) for BigIntegers that lie in the range of a long, returning the same hash that such a long value does.

      For floating point values, the patch makes their hashes consistent by converting floats to doubles and then hashing.

      One alternate approach would be to convert all double values to floats and hash float values only. However, this throws away half of the bits of the double value before hashing, leading to many undesirable hash collisions between different double values.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              pjstadig Paul Stadig
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: