Completed
Details
Assignee
UnassignedUnassignedReporter
Ghadi ShaybanGhadi ShaybanLabels
Approval
OkPatch
Code and TestPriority
CriticalAffects versions
Fix versions
Details
Details
Assignee
Unassigned
UnassignedReporter
Ghadi Shayban
Ghadi ShaybanLabels
Approval
Ok
Patch
Code and Test
Priority
Affects versions
Fix versions
Created April 15, 2016 at 5:09 AM
Updated August 19, 2016 at 6:35 PM
Resolved August 19, 2016 at 6:35 PM
When a range instance is enumerated via next concurrently, some threads may not see the entire range sequence. When this happens next will return nil prematurely.
(defn enumerate [r] (loop [rr r c []] (let [f (first rr)] (if f (recur (next rr) (conj c f)) c)))) (defn demo [size threads] (let [r (range size) futures (doall (repeatedly threads #(future (enumerate r)))) res (doall (map deref futures))] (if (apply = res) (println "Passed") (println "Failed, counts=" (map count res))))) (demo 300 4) Failed, counts= (300 64 300 64)
The demo above will reliably produce a failure every few executions like the one above.
The vast majority of the time, range is used either single-threaded or in a non-competing way where this scenario will never happen. This failure only occurs when two or more threads are enumerating rapidly through the same range instance.
Cause:
Each LongRange instance acts in several capacities - as a seq, a chunked seq, and a reducible, all of which represent independent enumeration strategies (multiple of which may be used by the user). LongRange holds 2 pieces of (volatile, non-synchronized) state related to chunking: _chunk and _chunkNext. That state is only updated in forceChunk(). forceChunk() uses the "racy single-check idiom" to tolerate multiple threads racing to create the chunk. That is, multiple threads may detect that the chunk has not been set (based on null _chunk) and BOTH threads will create the next chunk and write it. But both threads have good local values, compute the same next value, and set the same next values in the fields, so the order they finish is unimportant.
The problem here is that there are actually two fields, and they are set in the order _chunk then _chunkNext. Because the guard is based on _chunk, it's possible for a thread to think the chunk values have been set but _chunkNext hasn't yet been set.
Approach:
Moving the set for _chunkNext before the set for _chunk removes that narrow window of race opportunity.
Patch: clj-1914-3.patch
Thanks to Kyle Kingsbury for the initial reproducing case https://gist.github.com/aphyr/8746181beeac6a728a3aa018804d56f6