Improve speed of reading strings

Description

The current implementation of data.json constructs strings by reading an int at the time from the underlying stream, and then adding the int to a `StringBuilder` As seen in this can be improved. Improving the speed at which strings are read will have a great impact on the overall speed of reading json as object-keys are strings.

Patch: 0001-DJSON-31-Read-strings-through-byte-arrays.patch

Environment

None

Activity

Show:
Alex Miller
March 12, 2021, 5:18 PM

There's a lot of int/long coercion in the bytecode but I'm not that worried about that vs boxing, which I did not see. Rebuilt patch, applied.

Alex Miller
March 12, 2021, 4:38 PM

On the patch, the type hint on `buffer-size` is wrong - var type hints are evaluated so you're getting the `int` function object there. But really, I don't think it's worth pulling out this constant at all - just use 64 and 63 in the code.

I'm still looking at the disassembly of this a little more closely, want to check for boxing etc.

Erik Assum
March 12, 2021, 3:49 PM

I’ll create a separate issue for that.

Alex Miller
March 12, 2021, 3:38 PM
Edited

for the string cache, I'd break that idea out into a separate ticket. I think it would be particularly interesting if you could do this lookup only on keys (and not on values). transit does basically this same thing. I'm not sure the construction cost is really the big deal, but you might find big memory improvements from interning string keys so that a read of an array of 1000 maps with the same keys uses 1000x less heap for the key strings.

Erik Assum
March 8, 2021, 8:57 AM
Edited

It’s worth noting that Jackson also keeps a cache of field names, as pr

https://github.com/FasterXML/jackson-core/blob/ce8bd73daf3fc2f8931ce20eb8dfa5e93a982a35/src/main/java/com/fasterxml/jackson/core/json/ReaderBasedJsonParser.java#L1749

Basically, while reading through to the it calculates a hash value for for the string, and then uses that value to look up the string in a cache (which is of size 64)

So this would elide having to construct new strings for strings we’ve already seen.

https://github.com/FasterXML/jackson-core/blob/ce8bd73daf3fc2f8931ce20eb8dfa5e93a982a35/src/main/java/com/fasterxml/jackson/core/sym/CharsToNameCanonicalizer.java

Fixed

Assignee

Unassigned

Reporter

Erik Assum