Carriage returns in files can cause Pushback buffer overflow exception


With latest master version of tools.reader (1.3.0), some files with carriage returns that are not followed by a formfeed or newline character can cause the default pushback buffer size of 1 to be overflowed, resulting in an exception thrown during reading. The root cause is that peek-char is implemented with a read-char followed by unread for some types of reader objects, and read-char ends with a call to peek-char when such an isolated carriage return is found.

One example to reproduce:

If anyone is experiencing this problem and wants a workaround, if you can replace all carriage returns with newlines in the input file/string/reader, you should avoid this problem, although if you are using a line numbering/indexing reader, that may change the line numbers of lines in the data.




Andy Fingerhut
September 7, 2018, 11:55 PM

Another potential issue I realized while looking into this, which I can create a separate ticket for if you are interested: If you create any kind of reader that is layered over a, as the example cases above do, and the input reader has a long sequence of carriage returns in it, the implementation of read-char and normalize-newline can call each other as deeply nested as there are consecutive carriage return characters.

This is similar to this old issue for reading many consecutive spaces, although many consecutive carriage returns seems quite a bit less likely to crop up in practice:

Nicola Mometto
October 13, 2018, 10:53 PM

Just a heads up, I started looking into this and it's a bit more involved than I originally thought. As you say, the "hacky" solution doesn't really work in the presence of consecutive carriage returns. I think we need a bigger refactor although I'm still not clear what that'd involve. I'm on vacation now tho so I should have time to work on it.

Nicola Mometto
October 17, 2018, 6:37 PM

I've pushed a fix to, could you give it a go before I merge it?

Andy Fingerhut
October 21, 2018, 10:17 PM

Attached patch TRDR-54-part2-v1.patch is on top of the existing changes already committed to the branch of tools.reader.

One of the code changes is a pretty uncontroversial fix, I believe. The one with the additional code for IndexingPushbackReader's unread method is a bit weirder, and I would guess there may be better ways to do it.

Nicola Mometto
October 21, 2018, 11:50 PM

Thanks a lot for the patch and the second set of eyes on this, I think all the changes make sense. I've applied the patch to the branch, I'm going to merge it tomorrow unless I find unexpected issues with the approach



Nicola Mometto


Andy Fingerhut