Keyword cache cleanup incurs linear scan of cache
If the GC reclaims a keyword, any subsequent attempt to create a keyword requires an O(n) scan over the entire keyword table via Util.clearCache. This is a significant performance cost in keyword-heavy operations; e.g. JSON parsing.
Patch: keyword-cache.diff - patch to defer cleaning till portion of the table is dead and avoid multiple threads cleaning simultaneously.
Patch: kw-clean-future.patch - patch to spin cache cleaning into a future. Found that 1) this introduces some ordering constraints and circularity between Agent and Keyword (fixable) and 2) using the future pool for this means shutdown-agents would always need to be called (in the patch I avoided this by changing agent's soloExecutor to use daemon threads.
Patch: unified-kw-patch.diff - Alternative to keyword-cache and clean-future.patch. Combines all keyword-perf changes, including the EDN kw parser improvement, improved table lookup performance, and has threads cooperate to empty the table refqueue with a minimum of contention.
I too would be curious to see answers to Linus' questions above.
1) Kyle, what ratio is -XX:StringTableSize to the number of unique keys in your incoming JSON-data and other major sources of new Strings in the application?
2) Kyle (and Alex), do you have any theory on whether JDK 8u20 feature "JEP 192: String Deduplication in G1" would cure this particular issue? http://openjdk.java.net/jeps/248
3) Kyle (and Alex), is it likely so that the measured problem exaggerated by the fact that very many cores competes about two resources, the JVM-internal StringTable and the cache reference queue? Can Alex reproduce this in a similar multi-core environment?
Alex, you mention that this patch would probably address a very specific use-case but I realize that generative testing (generative.testing, clojure.data.generators and test.check) would likely suffer from this behaviour if the test interned symbols. Especially if one run the test on many cores in parallel (like generative.testing is designed to do).
I leave a long idea about a clojure.lang.String and a trie of char used to read/lookup symbols right from streams for later
I am not able to reproduce a performance issue due to a linear scan of the reference queue cache. Obviously the scan occurs but in most cases the scan is comparatively fast and does not seem to be a bottleneck in the tests I've run.
If there is a test that can reproduce this issue (on current master) and demonstrates an improvement with this patch, please reopen the ticket for investigation.
Kyle, CLJ-1439 was completed today via a commit to Clojure master. That also had the side effect of making all of the currently attached patches no longer apply cleanly. I haven't checked how easy or difficult it might be to update them to apply cleanly.
The patch is current, and it is focused on the intern/cache.