Improve clojure.core/distinct perf by using transient set
Current implementation of clojure.core/distinct uses persistent set. This patch improves performance of lazy arity by ~25%-30% and transducer by ~40%-50% by using transient set instead.
Benchmarking code: https://gist.github.com/tonsky/97dfe1f9c48eccafc983a49c7042fb21
Nikita, regarding your comment where you say ‘That guarantee implicitly presumes that there’s happens-before relation between transient usage from multiple threads. There’s no other way to define “only one thread is in this section at a time”.’
I believe there are technically ways to guarantee that at most one thread is accessing a transient at a time, without any Java Memory Model synchronization between them. I am not saying they are recommended ways to do it, but there are ways to do it.
For example, you could have a non-volatile, unsynchronized int field in a Java object, initially 0, and thread 2 is waiting until it changes to a non-0 value before it proceeds to access a transient. Thread 1 modifies the transient, then writes a value 1 to that int field. I know that the Java Memory Model makes no guarantees that Thread 2 will ever see that changed value from that store, and so I do not think anyone would recommend doing this, but if Thread 1 stops modifying the transient before writing 1 to that field, and Thread 2 never modifies the transient before it becomes non-0, I am pretty sure that we can say that they cannot simultaneously perform operations on the transient, and yet there is no happens-before relationship between them. (corrections welcome if I misunderstand JMM that badly, which is possible).
Similarly I believe there are other no-JMM-synchronization ways to guarantee this, e.g. thread 1 creates a file that did not exist before, and thread 2 is waiting until that file exists before proceeding with modifying the transient. Or thread 1 sends a request over the network to server A, and server A responds, and thread 2 receives that response and then proceeds with modifying the transient.
Again, I am not recommending that anyone write programs like that, and perhaps if we restrict “thread isolation” to mean “use proper JMM synchronization like volatile write by thread 1, read of same volatile by thread 2, or synchronize on a common lock in both threads”, then it seems to make sense that neither transducers nor transient objects (or any other mutable objects) need internal synchronization mechanisms for their state. If that were done, it seems like a good idea to document that they require outside synchronization to be passed between threads safely.
Also, long-living transients (e.g. in a transducers associated with a queue, for example) will hold a reference to a thread that created them. Is that a bad thing? Should we switch to boolean flag instead?
I have to admit my test was demonstrating something else: there were no proper thread isolation. So it was a concurrency issue, not “safe publication” issue. My current understanding is this:
Transients require thread isolation. Use of a particular transient instance should be controlled either by using it in an single-threaded scope, or in a framework that enforces this.
That guarantee implicitly presumes that there’s happens-before relation between transient usage from multiple threads. There’s no other way to define “only one thread is in this section at a time”.
That, in turn, means that all writes that happened in thread 1 are visible in thread 2, regardless to volatility of the variables involved. In fact, we can remove all volatiles from transients implementation and probably make them faster, because, by asking “no more than one thread at a time” we enforce users to establish happens-before between sections, and that would give us all the safe publication guarantees we need.
Is my understanding correct? Am I missing something?
I would say that test is demonstrating a bug in transient sets/maps and you should file a ticket for that as it's a lot more important than this enhancement.
distinct should be able to use transients in both the transducer and lazy seq impls. The issue with contains? not working on transients is actually a separate ticket - http://dev.clojure.org/jira/browse/CLJ-700 that will likely require some class hierarchy rearrangement. I don't think we would take this change until that is fixed (so that you can avoid relying on the class and Java method variants).
Maybe that doc http://clojure.org/reference/transients should be updated re: transients are not safe to use from multiple threads because changes made by one thread are not necessarily visible to another. Even if they don’t compete