Uploaded image for project: 'Clojure'
  1. CLJ-1766

Serializing+deserializing lists breaks their hash

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Completed
    • Affects versions: Release 1.7, Release 1.6
    • Fix versions: Release 1.8
    • Labels:
    • Environment:

      Tested on OS X 10.10.3, Clojure 1.6.0, Java 1.8.0_45-b14, but don't think this is a factor.

    • Approval:
      Ok
    • Patch:
      Code and Test

      Description

      clojure.lang.PersistentList implements Serializable but a serialization roundtrip results in a hash of 0. This appears to be caused by ASeq marking its _hash property as transient; other collection base classes don't do this. I don't know if there is a rationale for the distinction, but the result is that serializing, then deserializing, a list results in its _hash property being 0 instead of either its previous, calculated value, or -1, which would trigger recalculation.

      This means that any code that relies on a list's hash value can break in subtle ways.

      Examples:

      (import '[java.io ByteArrayInputStream ByteArrayOutputStream ObjectInputStream ObjectOutputStream])
      
      (defn obj->bytes [obj]
        (let [bytestream (ByteArrayOutputStream.)]
          (doto (ObjectOutputStream. bytestream) (.writeObject obj))
          (.toByteArray bytestream)))
      
      (defn bytes->obj [bs]
        (.readObject (ObjectInputStream. (ByteArrayInputStream. bs))))
      
      (defn round-trip [x] (bytes->obj (obj->bytes x)))
      
      user=> (hash '(1))
      -1381383523
      user=> (hash (round-trip '(1)))
      0
      user=> #{'(1) (round-trip '(1))}
      #{(1) (1)}
      user=> (def m {'(1) 1})
      #'user/m
      user=> (= m (round-trip m))
      true
      user=> (= (round-trip m) m)
      false
      

      Approach: Use 0 as the "uncomputed" hash value, not -1. The auto initialized value on serialization construction will then automatically trigger a re-computation.

      Alternate approaches:

      • Implement a readObject method that sets the _hash property to -1. This is somewhat complicated in the face of subclasses, etc.

      Note: Also need to consider other classes that use -1 instead of 0 as the uncomputed hash value: APersistentMap, APersistentSet, APersistentVector, PersistentQueue - however I believe in those cases they are not transient and thus may avoid the issue. Perhaps they should be transient though.

      Patch: clj-1766-2.patch
      Screened by: Alex Miller

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              Christopher Vermilion
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: