partition-by and partition-all transducers should ensure visibility of state changes

Description

The partition-by and partition-all transducers use state stored in an ArrayList. This state should be protected (for example, by volatile) to ensure visibility if used in a transducing process that moves computations across threads.

Approach: store unsynchronized ArrayList in volatile to ensure visibility

Patch: clj-2146-2.patch

Environment

None

Activity

Show:
Andy Fingerhut
September 6, 2019, 9:38 PM

This comment is merely to point out a relationship in JMM synchronization concerns between the internal mutable state of both transducers and transients in Clojure, and has some discussion in comments on both.

Alex Miller
January 27, 2020, 3:59 PM

clj-2146-2 is similar to clj-2146-v1 patch, but minimizes name changes and calls to vreset!, also pulls over another sequence partition-by test to transducer version for a little more coverage.

leonoel
January 28, 2020, 10:56 AM

Is it too much asking what problem is being addressed here ? The best rationale I’ve seen for this issue so far is basically “it will make improperly synchronized transducing contexts a bit less wrong“.

I can understand the kindergarten-friendliness goal, but sprinkling volatiles all over the place won’t magically make concurrent transducing context design a trivial task. Can’t we just assume we’re in the expert-only area here ?

Alex Miller
January 28, 2020, 2:22 PM

The standard we set when we developed transducers was that transducing functions would a) be guaranteed to run in a single thread at a time (but potentially multiple threads over time) and b) should ensure visibility to minimize constraints on multi-threaded transducing contexts (core.async being the main one that exists now, but with an eye towards others). These functions are the only ones that were not satisfying b and that is the problem being addressed.

I understand that you disagree about the necessity of . I'm not really interested in having that discussion yet again.

leonoel
January 31, 2020, 5:19 PM

thank you for stating the problem so clearly. This will be my final post, and I will do my best to stick to the facts. I wish the transducer rules had less room for interpretation, but it's still better than nothing. The most fuzzy point is the notion of time in rule a), which can be seen from a JMM point of view or from a more general point of view.

 

Assuming time expressed in rule a) refers to the JMM definition of time.

The JMM defines time with the HB partial order. run in a single thread at a time means two given calls to a reducing function are ordered by HB.

Claim : transducers are correctly synchronized under this interpretation of rule a).
Proof :
If R is the set of all actions performed in each call to a reducing function, for each two actions a and b in R :

  • if a and b are performed by the same call, they're ordered by HB according to JLS1.

  • otherwise, they're ordered by HB according to transducer rule a.

HB being a partial order, it follows that R is totally ordered by HB.

If v is a variable involved in the transducer state and a.v, b.v are two conflicting accesses to v.
Assumption: a.v and b.v are in R. Informally, mutable state captured by the reducing function doesn't leak outside of it. It is not guaranteed by the transducer rules (it should, IMO) but it holds in practice so it seems reasonable.
R is totally ordered by HB so a.v and b.v are ordered by HB.
It follows that R contains no data race, according to JLS2.

Consequently, transducers are correctly synchronized, according to JLS3.

 

JLS 17.4.5
JLS1: If x and y are actions of the same thread and x comes before y in program order, then hb(x, y).
JLS2: When a program contains two conflicting accesses that are not ordered by a happens-before relationship, it is said to contain a data race.
JLS3: A program is correctly synchronized if and only if all sequentially consistent executions are free of data races.

 

Transducer rule b) is unnecessary because it's not a requirement of the proof, and it makes no sense to agree or disagree with that because math is not about opinions. If you think the proof is flawed, then point out the flaw and the proof will cease to exist.

 

Assuming time expressed in rule a) refers to any possible causal relationship, including those external to the JMM.

run in a single thread at a time means some constraint prevents two calls to a reducing function to overlap. This constraint may or may not be materialized by HB, so two calls to a reducing function can be seen as concurrent by the JMM and we need to consider this special case.
C: the non-concurrency constraint is external and not materialized by any HB relationship.

If C doesn't hold, the proof in previous section is applicable so transducers are correctly synchronized.
Otherwise, according to rule b), the last volatile write and the first volatile read of two successive steps enforce hb-ordering, so transducers are still correctly synchronized (with a similar proof).

In other words, the non-concurrency requirement of transducer rule a is weaker than in the previous section because it includes all of its use cases plus the special case C, and transducer rule b) is a safety net for C.

 

Performance

I have provided numbers showing the negative impact of volatiles on transducer performance. These results were largely predictable, because volatile accesses put more constraints on optimizers (they restrain the set of possible executions). I'm aware benchmarking is not an exact science, but the methodology has not been disputed so far.

 

Conclusion

The tradeoff of rule b) is as follows.
pros: keep transducers correctly synchronized in contexts requiring condition C
cons: performance degradation in all contexts

So far, nobody has ever been able to exhibit a non-contrived example of a transducing context requiring condition C. It's not really surprising : the only reason to deliberately NOT enforce HB ordering is to relax consistency constraints hoping for better performance, but if the code to be run is already cluttered with volatile accesses due to rule b) there is hardly any constraint to relax.

Nevertheless, it has been argued that the performance impact was not significant (without providing any significance threshold, of course). We can reasonably conclude that performance in clojure is a second-class goal at best.

 

 

P.S: I had decided to get involved in the clojure contributing process because I naively thought my humble knowledge about the JMM and my experience with multithreaded transducing contexts could help improving the language. So far I haven't observed a single sign of interest for it from the clojure team and at this point I doubt anything positive is going to emerge from this collaboration. Therefore, I will stop asking for rationale about design decisions and close my jira account. Feel free to recycle it, I vaguely remember hearing that it was a scarce resource.

Assignee

Unassigned

Reporter

Alex Miller

Labels

Approval

Vetted

Patch

Code and Test

Affects versions

Priority

Major
Configure