pr-str and read-string don't handle @ symbols inside keywords properly


pr-str emits "{:key@other :stuff}", which read-string fails to interpret correctly. Either pr-str needs to escape the @ symbol, or read-string needs to handle the symbol inside a keyword.

Background: I'm passing a map with email addresses as keys through Storm bolts, which require a thrift-serializable form. Using the pr-str/read-string combo fails on these keys, so I've fallen back to JSON serialization.


Ubuntu 12.04 LTS; Java 1.7.0_05 Java HotSpot(TM) Client VM


Andy Fingerhut
September 13, 2012, 8:23 PM

It is a long standing property of Clojure that it does not throw exceptions for everything that is illegal.

September 17, 2012, 8:16 PM

Comment made by: stevenruppert

Yeah, but read-string clearly does. Is there a good reason that the "keyword" function can't throw an exception? With the other special rules on namespaces within symbol names, the "keyword" function really should be doing validation.

Another solution would be to allow a ruby-like :"symbol with disallowed characters" literal, but that would also be confusing with how the namespace is handled.!topic/clojure/Ct5v9w0yNAE has some older discussion on this topic.

Andy Fingerhut
September 18, 2012, 1:43 AM

Disclaimer: I'm not a Clojure/core member, just an interested contributor who doesn't know all the design decisions that were made here.

Steven, I think perhaps a couple of concerns are: (1) doing such checks would be slower than not doing them, and (2) implementing such checks means having to update them if/when the rules for legal symbols, keywords, namespace names, etc. change.

Would you be interested in writing strict versions of functions like symbol and keyword for addition to a contrib library? And test suites that try to hit a significant number of corner cases in the rules for what is legal and what is not? I mean those as serious questions, not rhetorical ones. This would permit people that want to use the strict versions of these functions to do so, and at the same time make it easy to measure performance differences between the strict and loose versions.

January 14, 2013, 4:58 AM

Comment made by: stevenruppert

Looking back at this, the root cause of the problem is that the {pr} function, although it by default "print[s] in a way that objects can be read by the reader" [0], doesn't always do so. Thus, the easiest "fix" is to change its docstring to warn that not all keywords can be read back.

The deeper problem is that symbol don't have a reader form that can represent all actually possible keywords (in this case, those with "@" in them). Restricting the actually-possible keywords to match the reader form, i.e. writing a strict "keyword" function actually seems like a worse solution overall to me. The better solution would be to somehow extend the keyword reader form to allow it to express all possible keywords, possibly ruby's :"keyword" syntax. Plus, that solution would avoid having to keep hypothetical strict keyword/symbol functions in sync with reader operation, and write test cases for that, and so on.

Thus, the resolution of this bug comes down to how far we're willing to go. Changing the docstring would be the easiest, but extending the keyword form would be the "best" resolution, IMO.


Andy Fingerhut
August 19, 2013, 4:54 PM

I happened across ticket yesterday. Its discussion thread shows that the topic of validating the contents of constructed keywords and symbols has arisen before. At the time, a patch was written that modified the functions "symbol" and "keyword" so that the symbol/keyword was constructed as it does now, but then it was double-checked for readability using the clojure.lang.RT/readString method on the string arg given. It threw an exception if the intern and readString methods returned non-equal symbols (or if readString threw an exception).

Rich was concerned that this run-time overhead would be too high, and asked if anyone knew a faster way of doing it. Chas Emerick proposed making all symbols readable using a syntax like Common Lisp's #|symbol with whitespace|, with some checks for the common case where the quoting would be unnecessary. Rich was open to the idea of quoting arbitrary symbols, but that it would be a different ticket than that one.

I am not aware of anyone creating a ticket to introduce quoting of arbitrary symbols since then, but I could have missed it. This ticket could become that ticket, but its description would need significant editing, and code changes would be needed in a variety of places in Clojure.









Affects versions