We're updating the issue view to help you get more done. 

Set literal duplicate check occurs too early.

Description

I cannot use literal syntax to create a set/map with unique members/keys if the elements are generated with an identical form. Examples of such legal forms: (rand), (read), (clojure.core.async/<!!), etc. I will use (rand) in these examples.

1 2 3 4 5 6 user=> #{(rand) (rand)} IllegalArgumentException Duplicate key: (rand) clojure.lang.PersistentHashSet.createWithCheck (PersistentHashSet.java:68) user=> {(rand) 1, (rand) 2} IllegalArgumentException Duplicate key: (rand) clojure.lang.PersistentArrayMap.createWithCheck (PersistentArrayMap.java:70)

It appears that the input is being checked for duplicates before the arguments to the collection constructors are evaluated. However, this doesn't prevent the need to run the check again later.

Note that duplicates are still (correctly) detected, after evaluation, even if duplicates do not appear as literals in the source:

1 2 3 4 5 6 user=> #{(+ 1 1) 2} IllegalArgumentException Duplicate key: 2 clojure.lang.PersistentHashSet.createWithCheck (PersistentHashSet.java:56) user=> {(+ 1 1) :a, 2 :b} IllegalArgumentException Duplicate key: 2 clojure.lang.PersistentArrayMap.createWithCheck (PersistentArrayMap.java:70)

The first duplicate check therefore seems to be both redundant and incorrect.

Note that this eager duplicate-checking seems to have higher precedence even than the syntax-quote reader macro.

1 2 3 4 5 6 7 user=> `#{~(rand) ~(rand)} IllegalArgumentException Duplicate key: (clojure.core/unquote (rand)) clojure.lang.PersistentHashSet.createWithCheck (PersistentHashSet.java:68) user=> `{~(rand) 1, ~(rand) 2} IllegalArgumentException Duplicate key: (clojure.core/unquote (rand)) clojure.lang.PersistentArrayMap.createWithCheck (PersistentArrayMap.java:70)

This is odd – since syntax-quote should not realize a collection at all at read time:

For Lists/Vectors/Sets/Maps, syntax-quote establishes a template of the corresponding data structure. Within the template, unqualified forms behave as if recursively syntax-quoted, but forms can be exempted from such recursive quoting by qualifying them with unquote or unquote-splicing, in which case they will be treated as expressions and be replaced in the template by their value, or sequence of values, respectively. (http://clojure.org/reader)

Definitions aside, based on the apparent expansion of syntax-quote, I would expect the previous to have worked correctly.

If I fake the expected macroexpansion by manually substituting the desired inputs, I get the expected results:

1 2 3 4 5 6 7 8 user=> '`#{~:a ~:b} (clojure.core/apply clojure.core/hash-set (clojure.core/seq (clojure.core/concat (clojure.core/list :b) (clojure.core/list :a)))) user=> (clojure.core/apply clojure.core/hash-set (clojure.core/seq (clojure.core/concat (clojure.core/list (rand)) (clojure.core/list (rand))))) #{0.27341896385866227 0.3051522362827035} user=> '`{~:a 1, ~:b 2} (clojure.core/apply clojure.core/hash-map (clojure.core/seq (clojure.core/concat (clojure.core/list :a) (clojure.core/list 1) (clojure.core/list :b) (clojure.core/list 2)))) user=> (clojure.core/apply clojure.core/hash-map (clojure.core/seq (clojure.core/concat (clojure.core/list (rand)) (clojure.core/list 1) (clojure.core/list (rand)) (clojure.core/list 2)))) {0.12476921225204185 2, 0.5807961046096718 1}

It seems to me that there is a superfluous duplicate check being run before the set/map reader macros evaluate their arguments. This check should seemingly be removed. Even if the check did not catch some false-positive duplicates (as it does), it would be unnecessary since the apparent second post-evaluation check would catch all true duplicates.

All that said, it's unclear that this check should happen at all. If I try to create sets/map with duplicate members/keys, I don't get an error. The duplicates are silently removed or superseded.

1 2 3 4 user=> (set (list 1 1)) #{1} user=> (hash-map 1 2 1 3) {1 3}

It seems it would be most consistent for literals constructed by the reader syntax to do the same.

I can see the argument that a literal representation is not a 'request to construct' but rather an attempt to simulate the printed representation of a literal data object. From that perspective, disallowing 'illegal' printed representations seems reasonable. Unfortunately, the possibility of evaluated forms inside literal vectors, sets, and maps (since lists are evaluated at read time) already breaks this theory. That is, the printed representation of such collections is not an accurately readable form, so read-time duplicate checking still cannot prevent seeming inconsistencies in print/read representations:

1 2 3 4 5 user=> '#{(+ 1 1) 2} #{(+ 1 1) 2} user=> #{(+ 1 1) 2} IllegalArgumentException Duplicate key: 2 clojure.lang.PersistentHashSet.createWithCheck (PersistentHashSet.java:56)

Given that the problem cannot be completely avoided at all, it seems simplest and most consistent to treat reader literal constructors like their run-time counterparts, as syntax quote would in the absence of the spurious duplicate check.

Environment

None

Status

Assignee

Unassigned

Reporter

import

Labels

Approval

None

Patch

None

Affects versions

Release 1.7
Release 1.6

Priority

Minor