Pushback buffer overflow reading large json file with clojure.data.json 2.1.0

Description

Per https://ask.clojure.org/index.php/10443/pushback-buffer-overflow-reading-large-json-file-with-clojure

The following works with clojure.data.json version 1.0.0, but throws an exception in 2.0.2 and 2.1.0 (other versions not tested):

The exception:
Execution error (IOException) at java.io.PushbackReader/unread (PushbackReader.java:179). Pushback buffer overflow`

It also throws the exception when reading from the file locally.

 

Problem

The problem stems from read-quoted-string which tries to read (for efficiency) a chunk of 64 bytes.

It then runs through these 64 bytes, and if it finds a end of string (\”) before it reaches 64 bytes, it will push back the remaining bytes onto the pushbackreader.

Now, you cannot push back more bytes to a pushbackreader than what is declared as its buffer size.

The reason the tests in json_test which exercises the Reader path doesn’t expose this bug, is that it doesn’t end up in read-quoted-string . Also, you’d need to have a situation where there are chars to be pushed back, which is now done in the test in the patch.

This bug was introduced as an oversight by me, since you’ll see in the code path where we read the string directly, I allocate a buffer of 64 for the pushbackreader (https://github.com/clojure/data.json/blob/master/src/main/clojure/clojure/data/json.clj#L302 ) and you will also notice that the repro gets fixed if you parse the json as a string as not as a stream (as mentioned as a workaround in the post on ask.clojure).

Environment

None

Activity

Show:
Alex Miller
April 15, 2021, 8:39 PM

Applied for release.

Alex Miller
April 15, 2021, 8:25 PM

Notifications should be going to original reporter, current assigned, and any watchers based on the notification scheme, and it seems like you are on the watch list. So I would expect you to be getting emails, unless you just added yourself to that.

Erik Assum
April 15, 2021, 7:09 PM

Sorry I haven’t responded to this, but I haven’t got any emails about the comments

Alex Miller
April 9, 2021, 9:37 PM
Edited

I have the same question as Andy... What's the problem, why does this fix address it? What I'm worried about is that this is related to the depth of open data arrays/maps and we are potentially recursively unreading where that matters. (Note that 1.1.0 used the default PushbackReader buffer size 1.)

Also, are there any perf implications?

Andy Fingerhut
April 9, 2021, 9:14 PM

Is it clear from the data.json code what the upper limit is on pushback operations, assuming it is somewhere between the default of 1 and the new proposed value in the patch of 64? I’ve dealt with pushback operations in tools.reader, and I know it can be difficult to analyze all of the cases – mainly curious if it was easy to tell in this case what the actual maximum number is, or whether 64 is just “big enough that it will work in more cases”.

Fixed

Assignee

Alex Miller

Reporter

Alex Miller

Patch

Code and Test