Reader supports poorly defined regexes that break code

Description

I ran into a strange case where CLJS emitted invalid code based on a poorly formatted regex that escaped / incorrectly.

Looking at such a regex, along with two similar but well formed regexes, passing through tools.reader:

But what does

mean here?

Looking at Clojure execution of these regexes:

ie.

behaves exactly like

Things get more unfortunate once CLJS get's involved, it does not expect the "heisen" regex - and the "dangling escape" ends up capturing the forward slash's escape, ie. an prematurely terminating regex is emitted.

Despite Clojure's existing "fortuitous" behaviour, perhaps the correct behaviour is to throw a reader exception for such regexes, as it does for

Alternatively, if

remains supported (for familiarity with users used to /.../ syntax), then the reader should emit

not

as the string value of the literal, ie. this "tolerance" should be part of the reader semantics rather than a concern for emitters.

See http://dev.clojure.org/jira/browse/CLJS-1399

Environment

None

Activity

Show:
Jeff Palentine
August 7, 2015, 2:29 PM

Happy to provide the patch, whichever way you lean for the solution.

Note that JIRA has done some strange formatting on some strings here, not sure how to solve this (perhaps best to read the markup directly)

Nicola Mometto
August 7, 2015, 3:45 PM

I'm not sure I understand what's being proposed here. The implementation of the regex reader in cljs is identical to the one in clj and all the regex examples you posted should be valid regex literals. If cljs doesn't compile the latter that's an issue in the cljs emitter, I can't change the clj syntax to workaround a bug in the cljs emitter.

Nicola Mometto
August 7, 2015, 3:48 PM

Note that it is a feature of the clj reader that

is a valid regex while

is not a valid string..

Jeff Palentine
August 8, 2015, 7:29 AM

Hi Nicola - thanks for taking the time to reformat the issue and consider it.

Initially planned to amend the cljs emitter, but David was of the opinion this should be "fixed" in tools.reader.

The situation where a basic literal, valid in Clojure, will silently break an entire cljs build is a very poor experience. Since it would be simple to avert through a simple reader or emitter change, I feel that one of those is necessary.

When you say that reading the form is a feature, that's what I refering to by tolerance. But if practically

means #"/", it seems more natural to normalize during reading. Perhaps I'm missing an understanding of why Clojure normalizes during emission rather, but to me this seems to blur semantics of the regex's underlying string (4 backslashes is a literal, 2 backslashes is a character escape, but forward slash is not an escapable character in Java or Javascript).

I'll submit the emitter patch back at http://dev.clojure.org/jira/browse/CLJS-1399. If you remain resolute that these regexes must be read in a backwards compatible manner, perhaps you could help make the argument to vett that ticket instead?

Declined

Assignee

Nicola Mometto

Reporter

Jeff Palentine

Labels

None

Approval

None

Patch

None

Priority

Minor