We're updating the issue view to help you get more done. 

Clojure reader should treat non-breaking space as whitespace character

Description

Right now, Clojure uses Character.isWhitespace(ch) || ch == ',' as the definition of whitespace in the reader. Character.isWhitespace, however, for obscure reasons (it has been defined long time ago), intentionally excludes U+00A0 (no-break space), U+2007 (figure space), U+202F (narrow no-break space). Logically, though, these characters should be treated as normal whitespace for all reasons except text formatting (e.g. the newer Character.isSpaceChar fixed that and does treat them as space chars).

Why is this important: if non-breaking space is inserted by accident (e.g. by pressing Option+Space on Mac), it'll be very hard to find the source of the error in a otherwise very innocent-looking code.

The attached patch implements Util.isWhitespace method that returns true for all characters treated as whitespace by Character.isWhitespace AND for those 3 exceptions. All cases where reading used Character.isWhitespace was referenced are modified to call new Util.isWhitespace instead.

Patch: clj-2207-nbsp-v3.patch

Prescreened by: Alex Miller

Environment

None

Status

Assignee

Unassigned

Reporter

Nikita Prokopov

Labels

Approval

None

Patch

Code

Priority

Minor