[imported from https://github.com/clojure/tools.reader/issues/14]
I came across a really bad behavior of the reader inside the browser, and scratched my head for the last 3 days trying to find a solution which I expose below (but you might have a better alternative, so I'd like feedback before starting the full process of submitting a patch through JIRA).
The context: I'm developing a web app that reads an edn (ajax call), and parses the result with read-string. As it is pure edn, I use the version in edn.cljs, but the problem also affects reader.cljs.
The problem I'm facing: I read a 3M edn file (hash-map containing a lot of strings of roughly 1000 characters). The memory usage for the resulting object blew up to around 90M...
I started exploring the heap and found out that the read-string* function resulted in concatenated strings of one character each, so each character in the strings ended up using 32 bytes! The problem seems to be adding one character at a time to the StringBuffer. While I understand the use of the StringBuffer for performance reasons, I found no way to "force" a conversion to a primitive string once reading the string is over.
On the other hand, using plain clojurescript str does the work, and I didn't notice a significant performance overhead. Memory usage went down to 7M. (In fact, since memory usage went down, I assume there is even less pressure on the GC). Here is my new read-string* in edn.cljs:
For the time being, I ship this patched version with my app, but others might benefit from this, or you might have an idea for a more performant low-level solution. (In fact, I'm not really sure why this works because str also uses a StringBuffer internally).
Another side note: escape-char takes sb as argument, but it is never used in that function...
Thanks in advance for your feedback!
Thanks for looking into this!
The current impl using StringBuffer is just a straight port from the jvm impl, I don’t have the time nor the knowledge to do any performance assessment of the impact of this change in cljs, so before I can consider this patch I’ll need some benchmarks measuring, at laest:
memory usage on multiple JS engines, before and after patch
performance on multiple JS engines on small, medium and large sized strings, before and after patch