Reading and writing JSON is too slow

Description

Problem

There is ample evidence data.json is "slow" in comparison to other Clojure parser libs, which are largely written as Clojure wrappers around existing fast JSON parsers, namely Jackson. There is no inherent reason this should be - it's all bytecode in the end.

Background

data.json was created in the early days of Clojure contrib with the goal to have a JSON parser available without regard for existing libraries in the Java ecosystem. Since then it has remained relevant (despite it's performance issues) because it has no 3rd party dependencies, making it easier to integrate into projects with minimal constraints. In particular, the Jackson library underneath other Clojure json libs is notorious for breaking API compatibility, often introducing transitive dependency conflicts in large dep trees.

Constraints

data.json's lack of transitive deps is an important constraint and the main reason for its continued relevance. We do not wish to use external libraries. However, being pure Clojure is not a goal - implementing it in a mixture of Java and Clojure is fine if necessary.

Consumers

A few important consumers to keep in mind when considering data.json usage are:

  • ClojureScript - used for reading/writing source maps and AOT cache data

  • Cognitect AWS API - used for reading/writing JSON on API interactions

  • Liberator - used for reading/writing JSON in REST interactions

  • Datomic - primarily used for reading config, but also in some logging uses

A review of these projects (and other users on GitHub) shows that consumers almost exclusively use read-str and write-str (and occasionally pprint). Thus, we get the biggest bang for the buck by focusing on those one-shot, in-memory calls.

Benchmarks

To track progress, and progress vs other libs, it is useful to have a benchmark. The jsonista benchmark seems pretty good, hits various size buckets, read-str/write-str, etc. I'd proposing using that as a first pass.

Reading

library/size of json

10b

100b

1k

10k

100k

data.json

1.415626 µs

4.647914 µs

26.278528 µs

292.573345 µs

2.831524 ms

cheshire

995.589616 ns

1.989409 µs

10.169683 µs

93.114467 µs

918.176008 µs

jsonista

358.347612 ns

1.349564 µs

6.684797 µs

71.806734 µs

699.329997 µs

jsoniter

197.685744 ns

651.712757 ns

3.418747 µs

43.674869 µs

543.987750 µs

Writing

library/size of json

10b

100b

1k

10k

100k

data.json

2.303828 µs

7.378648 µs

41.395170 µs

508.695050 µs

4.402777 ms

cheshire

1.025875 µs

2.544099 µs

9.372726 µs

105.356541 µs

1.170707 ms

jsonista

195.551519 ns

631.073911 ns

2.782485 µs

31.830190 µs

417.049649 µs

cheshire:

jsonista:

jsoniter

Questions

Next steps for investigation...

  • Why is data.json/write-str slow? We need data and hypotheses.

  • Why is data.json/read-str slow? We need data and hypotheses.

  • How strict/tolerant do we want to be wrt json parsing? There are a range of answers, and a range of compatibility. Reducing tolerance narrows possibilities and probably improves performance.

Note

The benchmarks below all build on the previous benchmark, ie the benchmarks for hypothesis 4 include all the implementations prior that hypothesis.

Hypothesis 1

The setup in read-str and read (use of apply and setting up dynamic bindings) is costly compared to the cost of parsing small json-payloads

Experiment: Remove usage dynamic vars

Expected result: data.json performs better on smaller json-payloads, same on larger payloads

Result (only showing for data.json):

10b

100b

1k

10k

100k

380.855588 ns

2.774149 µs

19.703588 µs

227.674437 µs

2.161491 ms

Implementation: 0001-remove-use-of-dynamic-vars-on-reading.patch

Tracked in

 

Hypothesis 2

Reading a byte at a time from a reader is slow

Experiment: Read a chunk of data from the reader and loop over that using aget

Expected result: data.json performance increases all over the board

Result (only showing for data.json):

10b

100b

1k

10k

100k

347.469268 ns

2.169539 µs

12.990627 µs

156.929191 µs

1.55405 ms

Implementation: 0001-DJSON-31-Read-strings-through-byte-arrays.patch

Comment: This implementation is hard to get really fast in Clojure as there is a bunch of potentially native operations on ints that get needs to be casted from longs since I haven’t figured out how to type-hint so I force ints instead of longs.

Comment: This implementation allocates a 64 char array on each string read. This could be avoided , by passing the char-array as a parameter and let the call site determine if we need a fresh char-array or if we can use a previously allocated one. One could imagine a pool of (pre) allocated char-arrays, but the book-keeping there might become cumbersome.

Tracked in

 

Hypothesis 3

Jsonista is fast because it uses Java interop to add elments to Clojure collections

Experiment: Drop down to Java when manipulating the collections

Expected result: For this to be somewhat faster, especially as the sizes increases

Result (only showing for data.json):

10b

100b

1k

10k

100k

347.921496 ns

2.005154 µs

10.401778 µs

126.592569 µs

1.227691 ms

Implementation: 0001-DJSON-31-Use-java-interop-to-work-with-transients.patch

Hypothesis 4

The code in read-object is sub-optimal

Experiment 1: Playing around with different options while still maintaining the overall structure of the fn

Expected result: Faster, but not a lot.

Result (only showing for data.json):

10b

100b

1k

10k

100k

333.135938 ns

1.961385 µs

9.848616 µs

119.508559 µs

1.185668 ms

Implementation: 0001-DSJON-31-Change-how-objects-are-created.patch

Tracked in

Environment

data.json 1.1.0

Activity

Show:
Erik Assum
March 14, 2021, 6:02 PM
Edited

After playing a bit around with H4 in I worked a bit on the flame graphs, see attached

data-json.svg and cheshire.svg. They show that both libs use a significant amount ~60% of their time on assoc'ing but Cheshire uses somewhat less time reading text.

data.json uses around 35% of the runtime on parsing text, so I guess that’s a clear indication of where to work.

Erik Assum
March 8, 2021, 7:49 AM

I agree that H4 is not a proper hypothesis, but the way read-object is implemented today has room for improvement:

  1. Experiments show that applying key-fn and value-fn is costly, as is the check for value-out being equal to value-fn (to elide the key/value from the object)

  2. The way read-object is written, there may be too many checks going on. We know that a json object has a certain structure, and we should be able to exploit that knowledge to get rid of some tests. Eg in https://github.com/clojure/data.json/blob/master/src/main/clojure/clojure/data/json.clj#L84 where we read the comma, we don’t really have to check for pending since we shouldn’t be allowed to be in a pending state when we reach a comma, as pending is indicating if we’re finished reading a kw-pair or not.

Erik Assum
March 7, 2021, 4:31 PM

H4, patch uploaded now, sorry for that.

As for writing, I’d really like to look into that as well, but for some reason my focus has been on reading, and just looking into that has been a bit challenging.

Alex Miller
March 7, 2021, 4:03 PM

It might make sense to spin these out into individual tickets as we decide they are worth doing.

H1 - seems well demonstrated by the flame graphs and the results and worth fixing. Passing N options explicitly as args is a little gross. Assuming the cost is in the dynamicness, could we compare constructing and passing an array map for key-fn, val-fn, and bigdec? instead? Since this opt map is constructed once and never escapes the thread or the call stack, can also consider mutable java array, java colls like ArrayList or HashMap, etc. Or even a Java class with public getter/setter could be very fast. What about the dyn vars on writing? Presume the same problem there. I think we should make a new ticket for this and try to cover both read/write and move to it to a point of commit.

H2 - I think this seems well-identified and the patch is in a good direction. I think this deserves a little closer attention to the generated bytecode and we can probably clean it up a bit more. Could be a separate ticket.

H3 - I think you can make this a better problem: "what is the fastest way to construct Clojure collections in bulk". That patch is gross, bleh. We should really look closely at this as there might be something we can provide in core for this general problem of "I have a known good coll, make a vector/map of it".

H4 - not a hypothesis, no patch I see, don't know.

What about writing? These are all about reading. The jsonista graphs show a much bigger gap on writing than reading.

Erik Assum
March 6, 2021, 12:14 PM

Initial performance testing of data.json/read-str versus

  • cheshire

  • jsonista

  • jsoniter

This is done with the code at

 

13:03 $ java -version
openjdk version "11.0.9" 2020-10-20
OpenJDK Runtime Environment (build 11.0.9+11)
OpenJDK 64-Bit Server VM (build 11.0.9+11, mixed mode)

clojure.data.json-perf-test> (all-sizes)
Results for 10b json:
data.json:
Evaluation count : 437970 in 6 samples of 72995 calls.
Execution time mean : 1.415626 µs
Execution time std-deviation : 19.804986 ns
Execution time lower quantile : 1.397219 µs ( 2.5%)
Execution time upper quantile : 1.438129 µs (97.5%)
Overhead used : 6.076479 ns

cheshire:
Evaluation count : 890196 in 6 samples of 148366 calls.
Execution time mean : 995.589616 ns
Execution time std-deviation : 157.205563 ns
Execution time lower quantile : 717.620317 ns ( 2.5%)
Execution time upper quantile : 1.129787 µs (97.5%)
Overhead used : 6.076479 ns

Found 1 outliers in 6 samples (16.6667 %)
low-severe 1 (16.6667 %)
Variance from outliers : 47.3888 % Variance is moderately inflated by outliers

jsonista:
Evaluation count : 1678146 in 6 samples of 279691 calls.
Execution time mean : 358.347612 ns
Execution time std-deviation : 40.060439 ns
Execution time lower quantile : 327.619645 ns ( 2.5%)
Execution time upper quantile : 425.237756 ns (97.5%)
Overhead used : 6.076479 ns

Found 1 outliers in 6 samples (16.6667 %)
low-severe 1 (16.6667 %)
Variance from outliers : 31.0181 % Variance is moderately inflated by outliers

jsoniter:
Evaluation count : 4140030 in 6 samples of 690005 calls.
Execution time mean : 197.685744 ns
Execution time std-deviation : 67.818404 ns
Execution time lower quantile : 141.067908 ns ( 2.5%)
Execution time upper quantile : 302.834050 ns (97.5%)
Overhead used : 6.076479 ns

Found 1 outliers in 6 samples (16.6667 %)
low-severe 1 (16.6667 %)
Variance from outliers : 81.7967 % Variance is severely inflated by outliers

Results for 100b json:
data.json:
Evaluation count : 132144 in 6 samples of 22024 calls.
Execution time mean : 4.647914 µs
Execution time std-deviation : 167.286213 ns
Execution time lower quantile : 4.485735 µs ( 2.5%)
Execution time upper quantile : 4.881229 µs (97.5%)
Overhead used : 6.076479 ns

cheshire:
Evaluation count : 365766 in 6 samples of 60961 calls.
Execution time mean : 1.989409 µs
Execution time std-deviation : 319.325722 ns
Execution time lower quantile : 1.676632 µs ( 2.5%)
Execution time upper quantile : 2.350099 µs (97.5%)
Overhead used : 6.076479 ns

jsonista:
Evaluation count : 463224 in 6 samples of 77204 calls.
Execution time mean : 1.349564 µs
Execution time std-deviation : 24.178647 ns
Execution time lower quantile : 1.321169 µs ( 2.5%)
Execution time upper quantile : 1.380849 µs (97.5%)
Overhead used : 6.076479 ns

jsoniter:
Evaluation count : 968610 in 6 samples of 161435 calls.
Execution time mean : 651.712757 ns
Execution time std-deviation : 38.679748 ns
Execution time lower quantile : 614.558819 ns ( 2.5%)
Execution time upper quantile : 706.033098 ns (97.5%)
Overhead used : 6.076479 ns

Results for 1k json:
data.json:
Evaluation count : 23712 in 6 samples of 3952 calls.
Execution time mean : 26.278528 µs
Execution time std-deviation : 1.092446 µs
Execution time lower quantile : 25.193380 µs ( 2.5%)
Execution time upper quantile : 27.603313 µs (97.5%)
Overhead used : 6.076479 ns

cheshire:
Evaluation count : 71592 in 6 samples of 11932 calls.
Execution time mean : 10.169683 µs
Execution time std-deviation : 2.524611 µs
Execution time lower quantile : 8.193420 µs ( 2.5%)
Execution time upper quantile : 14.137084 µs (97.5%)
Overhead used : 6.076479 ns

jsonista:
Evaluation count : 93600 in 6 samples of 15600 calls.
Execution time mean : 6.684797 µs
Execution time std-deviation : 248.024976 ns
Execution time lower quantile : 6.362812 µs ( 2.5%)
Execution time upper quantile : 6.958020 µs (97.5%)
Overhead used : 6.076479 ns

jsoniter:
Evaluation count : 187350 in 6 samples of 31225 calls.
Execution time mean : 3.418747 µs
Execution time std-deviation : 253.414681 ns
Execution time lower quantile : 3.277405 µs ( 2.5%)
Execution time upper quantile : 3.854009 µs (97.5%)
Overhead used : 6.076479 ns

Found 1 outliers in 6 samples (16.6667 %)
low-severe 1 (16.6667 %)
Variance from outliers : 15.3501 % Variance is moderately inflated by outliers

Results for 10k json:
data.json:
Evaluation count : 2106 in 6 samples of 351 calls.
Execution time mean : 292.573345 µs
Execution time std-deviation : 8.158315 µs
Execution time lower quantile : 281.535068 µs ( 2.5%)
Execution time upper quantile : 301.001290 µs (97.5%)
Overhead used : 6.076479 ns

cheshire:
Evaluation count : 6456 in 6 samples of 1076 calls.
Execution time mean : 93.114467 µs
Execution time std-deviation : 1.155807 µs
Execution time lower quantile : 91.917226 µs ( 2.5%)
Execution time upper quantile : 94.788605 µs (97.5%)
Overhead used : 6.076479 ns

jsonista:
Evaluation count : 8358 in 6 samples of 1393 calls.
Execution time mean : 71.806734 µs
Execution time std-deviation : 2.111767 µs
Execution time lower quantile : 69.853903 µs ( 2.5%)
Execution time upper quantile : 75.085305 µs (97.5%)
Overhead used : 6.076479 ns

Found 1 outliers in 6 samples (16.6667 %)
low-severe 1 (16.6667 %)
Variance from outliers : 13.8889 % Variance is moderately inflated by outliers

jsoniter:
Evaluation count : 12972 in 6 samples of 2162 calls.
Execution time mean : 43.674869 µs
Execution time std-deviation : 902.517107 ns
Execution time lower quantile : 42.663901 µs ( 2.5%)
Execution time upper quantile : 44.738642 µs (97.5%)
Overhead used : 6.076479 ns

Results for 100k json:
data.json:
Evaluation count : 228 in 6 samples of 38 calls.
Execution time mean : 2.831524 ms
Execution time std-deviation : 101.971353 µs
Execution time lower quantile : 2.759920 ms ( 2.5%)
Execution time upper quantile : 2.998651 ms (97.5%)
Overhead used : 6.076479 ns

Found 1 outliers in 6 samples (16.6667 %)
low-severe 1 (16.6667 %)
Variance from outliers : 13.8889 % Variance is moderately inflated by outliers

cheshire:
Evaluation count : 666 in 6 samples of 111 calls.
Execution time mean : 918.176008 µs
Execution time std-deviation : 25.261369 µs
Execution time lower quantile : 896.487685 µs ( 2.5%)
Execution time upper quantile : 951.829230 µs (97.5%)
Overhead used : 6.076479 ns

jsonista:
Evaluation count : 900 in 6 samples of 150 calls.
Execution time mean : 699.329997 µs
Execution time std-deviation : 28.148035 µs
Execution time lower quantile : 673.851807 µs ( 2.5%)
Execution time upper quantile : 738.360876 µs (97.5%)
Overhead used : 6.076479 ns

jsoniter:
Evaluation count : 1134 in 6 samples of 189 calls.
Execution time mean : 543.987750 µs
Execution time std-deviation : 19.522582 µs
Execution time lower quantile : 526.123640 µs ( 2.5%)
Execution time upper quantile : 565.757349 µs (97.5%)
Overhead used : 6.076479 ns

;; => nil

Assignee

Erik Assum

Reporter

Alex Miller