We're updating the issue view to help you get more done. 

re-matches is incorrect

Description

The re-matches function does not have the correct semantics: it performs a search (not match) against the string and returns nil if the string and matched-string are unequal. This is not the same as true matching, which is like inserting "^" and "$" at the beginning and end of the pattern.

Example in Clojure:

1 2 3 4 user=> (re-find #"0|[1-9][0-9]+|0[xX][0-9a-zA-Z]+" "0x1") "0" user=> (re-matches #"0|[1-9][0-9]+|0[xX][0-9a-zA-Z]+" "0x1") "0x1"

Compare Clojurescript:

1 2 3 4 ClojureScript:cljs.user> (re-find #"0|[1-9][0-9]+|0[xX][0-9a-zA-Z]+" "0x1") "0" ClojureScript:cljs.user> (re-matches #"0|[1-9][0-9]+|0[xX][0-9a-zA-Z]+" "0x1") nil

This bug is (one of the) reasons why CLJS-775.

I'm not completely sure what to do here. My first thought is to have re-matches inspect the -source property of its regex input, wrap the string with "^$", then carefully copy all flags over to a new regexp.

Questions:

  1. Are there any valid patterns where this is not safe? E.g., where we could not put ^ first? Is "^^abc$$" ok?

  2. Can we avoid cloning if ^ and $ are already the first and last chars of the pattern?

  3. How does multiline mode play in to this, if at all?

  4. regexinstance.lastIndex is a piece of mutability on regex instances (or the RegExp global on older browsers) which is used as a string offset for multiple invocations of exec() on the same string. I have no idea what to do if re-* gets a regex with the global flag set. (BTW, this is a very good reason to reject CLJS-150: allowing clojure to accept the global flag makes regular expression objects stateful, and would completely screw up re-seq for example.)

Environment

None

Status

Assignee

Unassigned

Reporter

Francis Avila

Labels

None

Approval

None

Patch

None

Priority

Minor