Serializing+deserializing lists breaks their hash
clojure.lang.PersistentList implements Serializable but a serialization roundtrip results in a hash of 0. This appears to be caused by ASeq marking its _hash property as transient; other collection base classes don't do this. I don't know if there is a rationale for the distinction, but the result is that serializing, then deserializing, a list results in its _hash property being 0 instead of either its previous, calculated value, or -1, which would trigger recalculation.
This means that any code that relies on a list's hash value can break in subtle ways.
Approach: Use 0 as the "uncomputed" hash value, not -1. The auto initialized value on serialization construction will then automatically trigger a re-computation.
Implement a readObject method that sets the _hash property to -1. This is somewhat complicated in the face of subclasses, etc.
Note: Also need to consider other classes that use -1 instead of 0 as the uncomputed hash value: APersistentMap, APersistentSet, APersistentVector, PersistentQueue - however I believe in those cases they are not transient and thus may avoid the issue. Perhaps they should be transient though.
Screened by: Alex Miller
Tested on OS X 10.10.3, Clojure 1.6.0, Java 1.8.0_45-b14, but don't think this is a factor.
Latest patch is identical, just does not remove the commented out code.
I agree with Ghadi - there is no change in implementation here, just some commented (non-used) code was removed. Moving back to Screened.
The only change I see is the removal of a commented-out impl.
This patch both fixes the serialization problem and changes the implementation for no reason related to the problem. The implementation works in place in order to avoid head-holding, which the implementation change reintroduces. Screeners - please make sure patches contain the minimal code to address the problem and don't incorporate other extraneous 'improvements'.
As recomended, I followed the change-default-to-zero path.
Not only it sidesteps the serialization issue, it looks more aligned with
the semantics of transient. Of course, there is no guarantees
about how different JVM implementations will deal with it - sometimes
they will be serialized, sometimes they will not.
So, together with the patch I've made some manual tests between some versions. The script used is attached for further inspection.
Running on a 1.6 REPL has shown on the original described issue:
Running on 1.8 master. Despite of the = behavior looking ok, the
hashes are different between roundtrips, like we saw on 1.6:
Running on 1.8 after patch the hashes are correctly computed on both cases:
Please note I've dumped each serialized data into different files, so I could test the interoperability between those versions. What I found:
1.6 serialization is already incompatible with 1.8, on both directions;
1.8 data could be exchange between master and patched versions, not affecting version behavior (hashes are computed only on patched REPL).
Did I miss something or everything looks correct for you too? I'm open to do some more exhaustive testing if anyone could give me some directions about where to explore.