Improve speed of reading strings


The current implementation of data.json constructs strings by reading an int at the time from the underlying stream, and then adding the int to a `StringBuilder` As seen in this can be improved. Improving the speed at which strings are read will have a great impact on the overall speed of reading json as object-keys are strings.

Patch: 0001-DJSON-31-Read-strings-through-byte-arrays.patch




Alex Miller
March 12, 2021, 5:18 PM

There's a lot of int/long coercion in the bytecode but I'm not that worried about that vs boxing, which I did not see. Rebuilt patch, applied.

Alex Miller
March 12, 2021, 4:38 PM

On the patch, the type hint on `buffer-size` is wrong - var type hints are evaluated so you're getting the `int` function object there. But really, I don't think it's worth pulling out this constant at all - just use 64 and 63 in the code.

I'm still looking at the disassembly of this a little more closely, want to check for boxing etc.

Erik Assum
March 12, 2021, 3:49 PM

I’ll create a separate issue for that.

Alex Miller
March 12, 2021, 3:38 PM

for the string cache, I'd break that idea out into a separate ticket. I think it would be particularly interesting if you could do this lookup only on keys (and not on values). transit does basically this same thing. I'm not sure the construction cost is really the big deal, but you might find big memory improvements from interning string keys so that a read of an array of 1000 maps with the same keys uses 1000x less heap for the key strings.

Erik Assum
March 8, 2021, 8:57 AM

It’s worth noting that Jackson also keeps a cache of field names, as pr

Basically, while reading through to the it calculates a hash value for for the string, and then uses that value to look up the string in a cache (which is of size 64)

So this would elide having to construct new strings for strings we’ve already seen.





Erik Assum