Memory issue when reading a string

Description

[imported from https://github.com/clojure/tools.reader/issues/14]

Hello,

I came across a really bad behavior of the reader inside the browser, and scratched my head for the last 3 days trying to find a solution which I expose below (but you might have a better alternative, so I'd like feedback before starting the full process of submitting a patch through JIRA).

The context: I'm developing a web app that reads an edn (ajax call), and parses the result with read-string. As it is pure edn, I use the version in edn.cljs, but the problem also affects reader.cljs.
The problem I'm facing: I read a 3M edn file (hash-map containing a lot of strings of roughly 1000 characters). The memory usage for the resulting object blew up to around 90M...

I started exploring the heap and found out that the read-string* function resulted in concatenated strings of one character each, so each character in the strings ended up using 32 bytes! The problem seems to be adding one character at a time to the StringBuffer. While I understand the use of the StringBuffer for performance reasons, I found no way to "force" a conversion to a primitive string once reading the string is over.

On the other hand, using plain clojurescript str does the work, and I didn't notice a significant performance overhead. Memory usage went down to 7M. (In fact, since memory usage went down, I assume there is even less pressure on the GC). Here is my new read-string* in edn.cljs:

For the time being, I ship this patched version with my app, but others might benefit from this, or you might have an idea for a more performant low-level solution. (In fact, I'm not really sure why this works because str also uses a StringBuffer internally).

Another side note: escape-char takes sb as argument, but it is never used in that function...

Thanks in advance for your feedback!

Environment

None

Activity

Show:
Nicola Mometto
July 11, 2019, 2:55 PM
Edited

Thanks for looking into this!

The current impl using StringBuffer is just a straight port from the jvm impl, I don’t have the time nor the knowledge to do any performance assessment of the impact of this change in cljs, so before I can consider this patch I’ll need some benchmarks measuring, at laest:

  • memory usage on multiple JS engines, before and after patch

  • performance on multiple JS engines on small, medium and large sized strings, before and after patch

 

Assignee

Unassigned

Reporter

Nicola Mometto

Labels

None

Approval

None

Patch

None

Priority

Major