Symbol reader includes whitespace characters

Description

From https://ask.clojure.org/index.php/9762/error-compiler-ignore-isidentifierignorable-characters

Cause: The Clojure symbol reader regex pattern (in LispReader) primarily matches with

which will match any character (except line terminators). This includes some characters that print as whitespace but are not treated as whitespace by the reader, such as zero width space http://www.unicode-symbol.com/u/200B.html. Pattern doc: https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html.

While symbol reading intentionally accepts a wide set of characters, whitespace characters (especially zero-width characters that don't print) seem both unuseful and difficult to parse for error messages.

Proposed: Narrow the symbol reader regex to primarily match `\H` (non-horizontal whitespace characters) instead of `.`.

Alternative: Expand the whitespace reader to include more unicode whitespace characters, then the reader would split up a symbol like this. I think that would departs from what Java does though - this was explored in CLJ-2207.

Environment

None
Your pinned fields
Click on the next to a field label to start pinning.

Assignee

Unassigned

Reporter

Alex Miller

Labels

Approval

Triaged

Priority

Minor

Affects versions