Keyword cache cleanup incurs linear scan of cache
Description
Environment
Attachments
Activity

Jonathan LeonardOctober 9, 2015 at 1:18 AM
I too would be curious to see answers to Linus' questions above.

Linus EricssonJuly 19, 2015 at 11:42 PM
Some questions:
1) Kyle, what ratio is -XX:StringTableSize to the number of unique keys in your incoming JSON-data and other major sources of new Strings in the application?
2) Kyle (and Alex), do you have any theory on whether JDK 8u20 feature "JEP 192: String Deduplication in G1" would cure this particular issue? http://openjdk.java.net/jeps/248
3) Kyle (and Alex), is it likely so that the measured problem exaggerated by the fact that very many cores competes about two resources, the JVM-internal StringTable and the cache reference queue? Can Alex reproduce this in a similar multi-core environment?
Alex, you mention that this patch would probably address a very specific use-case but I realize that generative testing (generative.testing, clojure.data.generators and test.check) would likely suffer from this behaviour if the test interned symbols. Especially if one run the test on many cores in parallel (like generative.testing is designed to do).
I leave a long idea about a clojure.lang.String
and a trie of char[] used to read/lookup symbols right from streams for later

Alex MillerAugust 2, 2014 at 5:35 AM
I am not able to reproduce a performance issue due to a linear scan of the reference queue cache. Obviously the scan occurs but in most cases the scan is comparatively fast and does not seem to be a bottleneck in the tests I've run.
If there is a test that can reproduce this issue (on current master) and demonstrates an improvement with this patch, please reopen the ticket for investigation.

Andy FingerhutAugust 2, 2014 at 3:31 AM
Kyle, CLJ-1439 was completed today via a commit to Clojure master. That also had the side effect of making all of the currently attached patches no longer apply cleanly. I haven't checked how easy or difficult it might be to update them to apply cleanly.

Kyle KingsburyJuly 23, 2014 at 8:35 PM
The patch is current, and it is focused on the intern/cache.
Details
Assignee
Alex MillerAlex MillerReporter
Kyle KingsburyKyle KingsburyLabels
Approval
VettedPatch
CodePriority
MinorAffects versions
Fix versions
Details
Details
Assignee

Reporter

If the GC reclaims a keyword, any subsequent attempt to create a keyword requires an O(n) scan over the entire keyword table via Util.clearCache. This is a significant performance cost in keyword-heavy operations; e.g. JSON parsing.
Patch: keyword-cache.diff - patch to defer cleaning till portion of the table is dead and avoid multiple threads cleaning simultaneously.
Patch: kw-clean-future.patch - patch to spin cache cleaning into a future. Found that 1) this introduces some ordering constraints and circularity between Agent and Keyword (fixable) and 2) using the future pool for this means shutdown-agents would always need to be called (in the patch I avoided this by changing agent's soloExecutor to use daemon threads.
Patch: unified-kw-patch.diff - Alternative to keyword-cache and clean-future.patch. Combines all keyword-perf changes, including the EDN kw parser improvement, improved table lookup performance, and has threads cooperate to empty the table refqueue with a minimum of contention.