We're updating the issue view to help you get more done. 

Reader supports poorly defined regexes that break code

Description

I ran into a strange case where CLJS emitted invalid code based on a poorly formatted regex that escaped / incorrectly.

Looking at such a regex, along with two similar but well formed regexes, passing through tools.reader:

1 2 3 (str #"/") => "\/" (str #"\/") => "\\/" (str #"\\/") => "\\\\/"

But what does

1 "\\/"

mean here?

Looking at Clojure execution of these regexes:

1 2 3 (re-find #"/" "\/") => "/" (re-find #"\/" "\/") => "/" (re-find #"\\/" "\/") => "\\/"

ie.

1 #"\/"

behaves exactly like

1 #"/"

Things get more unfortunate once CLJS get's involved, it does not expect the "heisen" regex - and the "dangling escape" ends up capturing the forward slash's escape, ie. an prematurely terminating regex is emitted.

Despite Clojure's existing "fortuitous" behaviour, perhaps the correct behaviour is to throw a reader exception for such regexes, as it does for

1 "\/"

Alternatively, if

1 #"\/"

remains supported (for familiarity with users used to /.../ syntax), then the reader should emit

1 "/"

not

1 "\\/"

as the string value of the literal, ie. this "tolerance" should be part of the reader semantics rather than a concern for emitters.

See http://dev.clojure.org/jira/browse/CLJS-1399

Environment

None

Status

Assignee

Nicola Mometto

Reporter

Jeff Palentine

Labels

None

Approval

None

Patch

None

Priority

Minor