Writing of strings is inefficient
Todays implementation looks at each char in a string, and appends that to a StringBuilder according to certain rules. This is needed in order to handle unicode and escaped characters.
This leaves room for an optimization if we observe that most strings we write will be unescaped and within ascii range.
See 163-cpu-flamegraph-2021-03-12-12-47-10.svg for evidence of this
Split write-string in a fast path and a slow path. As long as we only see unescaped ascii chars, we stay on the fast path and write the string directly to the PrintWriter. As soon as we see anything else, we use the old implementation of write-string to deal with it.
This is based applying the patch from
See 183-cpu-flamegraph-2021-03-12-13-21-48.svg for flame graph after patch applied
Approach 2 Consolidate String writing
Don’t split the string writing. This also solves
Applied 0002-DJSON-34-Speed-up-writing-strings.patch (2nd approach)
When looking at rewrites of this, might also want to consider too.
dis.txt is disassembly of clojure.data.json.write-string.fn__71 which is the body of the inner loop. The :simple-ascii case is a compact definition of a big range of ints, but interestingly the tableswitch creates a different case for every int (but the code is duplicated in all of them). This creates much larger bytecode in this method that is in our hot loop, potentially affecting its optimization. I'm not actually sure why this happens.
It's worth looking at writing specialized logic for this specific loop I think and doing some perf comparison.