Clarify and align valid symbol and keyword rules for Clojure (and edn)
Known areas of under-specificity (http://clojure.org/reader#The%20Reader--Reader%20forms):
symbols (and keywords) description do not mention constituent characters that are currently in use by Clojure functions such as <, >, =, $ (for Java inner classes), & (&form and &env in macros), % (stated to be valid in edn spec)
keywords currently accept leading numeric characters which is at odds with the spec - see CLJ-1286
http://dev.clojure.org/pages/viewpage.action?pageId=11862058 - design page for arbitrary symbol support
Comment made by: kunstmusik
Coming to this late, I had mentioned on the user mailing list in:
that # is currently allowed as part of a symbol name, such that:
(let [a# 4 b#a 3] (println a# b#a))
will print "4 3".
is also employed in auto-gensyms and discussed in http://clojure.org/reference/reader#syntax-quote as part of a symbol's name. From the mailing list thread, # was noted as "may be allowed now, but could be changed later". I would appreciate if it is more clearly described as a special case/reserved, and would ask that its use be restricted in the reader to prevent users from using it now and potentially have code break later.
I would say : (and : are syntactic markers and the spec describes the characters following it. But I agree it would be nice for this to be more explicit. The (incorrect) regex in LispReader does not help either.
The tagged literal idea is an interesting alternative to the | | syntax that has been reserved for possible future support for invalid characters in keywords and symbols. But I think the idea is out of scope for this ticket, which is really about clarifying the spec.
Another source of ambiguity I see is that it's unclear whether the first colon of a keyword is the first character of the keyword (and therefore of the symbol) or whether it is something special and the spec really describes what happens from the second character onward. This matters because the specification for a keyword is (in both edn and reader specs) given in terms of differences from symbols. I think many of the strange keyword edge cases (including legality of :1 vs :a/1) stem from this ambiguity, and different tickets/patches seem to choose one or the other underlying assumption. See this comment for more examples.
Possibly we can use tagged literals for keywords and symbols to create or print these forms when they are not readable and simplify the reader spec for their literal forms. E.g. instead of producing complicated parse rules to ensure clojure.core// or :1 are legal, just make the literal form simple and have users write something like #sym["clojure.core" "/"] or #kyw "1" (and have the printer print these) when they hit these edge cases.
The updated reader spec says that a symbol can contain a single / to separate the namespace. It also mentions a bare / to be the division function.
So what about clojure.core//? That still got to be a readable symbol right? So is that an exception to the 'single /' rule?
Will foo.bar// also be readable? What about foo//bar?
Comment made by: adamfrey
Related to this: The Clojure reader will not accept symbols and keywords that contain consecutive colons (See LispReader.java), although that is permitted by the current EDN spec. Here is a GitHub issue regarding consecutive colons. I would like to qualify why consecutive colons are disallowed, and sync up the Clojure Reader and the EDN spec on this.