Clarify and align valid symbol and keyword rules for Clojure (and edn)

Description

Known areas of under-specificity (http://clojure.org/reader#The%20Reader--Reader%20forms):

  • symbols (and keywords) description do not mention constituent characters that are currently in use by Clojure functions such as <, >, =, $ (for Java inner classes), & (&form and &env in macros), % (stated to be valid in edn spec)

  • keywords currently accept leading numeric characters which is at odds with the spec - see CLJ-1286

References:

Environment

None

Activity

Show:
import
November 8, 2016, 5:16 PM

Comment made by: kunstmusik

Coming to this late, I had mentioned on the user mailing list in:

https://groups.google.com/forum/#!topic/clojure/CwZHu1Eszbk

that # is currently allowed as part of a symbol name, such that:

(let [a# 4 b#a 3] (println a# b#a))

will print "4 3".

  1. is also employed in auto-gensyms and discussed in http://clojure.org/reference/reader#syntax-quote as part of a symbol's name. From the mailing list thread, # was noted as "may be allowed now, but could be changed later". I would appreciate if it is more clearly described as a special case/reserved, and would ask that its use be restricted in the reader to prevent users from using it now and potentially have code break later.

Alex Miller
September 10, 2015, 3:44 PM

I would say : (and : are syntactic markers and the spec describes the characters following it. But I agree it would be nice for this to be more explicit. The (incorrect) regex in LispReader does not help either.

The tagged literal idea is an interesting alternative to the | | syntax that has been reserved for possible future support for invalid characters in keywords and symbols. But I think the idea is out of scope for this ticket, which is really about clarifying the spec.

Francis Avila
September 10, 2015, 3:26 PM

Another source of ambiguity I see is that it's unclear whether the first colon of a keyword is the first character of the keyword (and therefore of the symbol) or whether it is something special and the spec really describes what happens from the second character onward. This matters because the specification for a keyword is (in both edn and reader specs) given in terms of differences from symbols. I think many of the strange keyword edge cases (including legality of :1 vs :a/1) stem from this ambiguity, and different tickets/patches seem to choose one or the other underlying assumption. See this comment for more examples.

Possibly we can use tagged literals for keywords and symbols to create or print these forms when they are not readable and simplify the reader spec for their literal forms. E.g. instead of producing complicated parse rules to ensure clojure.core// or :1 are legal, just make the literal form simple and have users write something like #sym["clojure.core" "/"] or #kyw "1" (and have the printer print these) when they hit these edge cases.

Herwig Hochleitner
July 31, 2015, 2:03 PM

The updated reader spec says that a symbol can contain a single / to separate the namespace. It also mentions a bare / to be the division function.
So what about clojure.core//? That still got to be a readable symbol right? So is that an exception to the 'single /' rule?
Will foo.bar// also be readable? What about foo//bar?

import
July 15, 2015, 5:55 PM

Comment made by: adamfrey

Related to this: The Clojure reader will not accept symbols and keywords that contain consecutive colons (See LispReader.java), although that is permitted by the current EDN spec. Here is a GitHub issue regarding consecutive colons. I would like to qualify why consecutive colons are disallowed, and sync up the Clojure Reader and the EDN spec on this.

Your pinned fields
Click on the next to a field label to start pinning.

Assignee

Unassigned

Reporter

Herwig Hochleitner

Labels

Approval

Triaged

Priority

Major