empty? is broken for transient collections

Description

Couldn't find whether it was brought up earlier, but it seems that empty? predicate is broken for transient collections

The workaround is to use (zero? (count (transient ...))) check instead.

Cause: empty? is based on seqability, which transients don't implement.

Proposed Add a branch to empty? for counted? colls. Transients implement Counted so gain support via this branch. Other colls that are counted are faster. Seq branch continues to work for seqs.

Perf test:

Results:

coll

before

after

result

p

0.72 ms

0.08 ms

much faster when empty

p1

0.15 ms

0.13 ms

slightly faster when not empty

t

error

0.19 ms

no longer errors

t1

error

0.20 ms

no longer errors

Not sure if doc string should be tweaked to be more generic, particularly the "same as (not (seq coll))" which is now only true for Seqable but not Counted. I think the advise to use (seq coll) for seq checks is still good there.

I did a skim for other types that are Counted but not seqs/Seqable and didn't find much other than internal things like ChunkBuffer. Many are both and would thus use the counted path instead (all the persistent colls for example and any kind of IndexedSeq).

I guess another option would be just to fully switch empty? to be about (zero? (bounded-count 1 coll)) and lean on count's polymorphism completely.

Patch: clj-1872.patch

Environment

None

Activity

Show:
Alex Miller
December 27, 2015, 3:58 AM

Probably similar to CLJ-700.

Devin Walters
May 9, 2019, 8:52 PM

As mentioned in CLJ-700, this is a different issue.

Devin Walters
May 9, 2019, 11:47 PM

First things first, the original description brings up `(empty? (transient ()))`. Per the documentation at https://clojure.org/reference/transients, there is no benefit to be had for supporting transients on lists.

Current behavior for java collections:

The same behavior is true of java arrays.

Over in CLJS-2802, the current patch's approach is to `cond` around the problem in `empty?` by explicitly checking whether it's a TransientCollection, and if so, using `(zero? (count coll))` as the original description mentions as a workaround.

Currently, transient collections do not implement Iterable as the persistent ones do. If Iterable were implemented, I believe RT.seqFrom would work, and by extension, `empty?`.

Alex Miller
May 13, 2019, 7:27 PM

I think there are good reasons for transient collections not to be Seqable - seqs imply caching, caching hurts perf, and the whole reason to be using transients is for batch load perf. So that seems counter-productive. Iterators are stateful and again, I suspect that is probably a bad thing to add just for the sake of checking empty?.

An explicit check for emptiness of counted? colls would cover all the transient colls and anything else counted without making a seq. That might be faster for all those cases, and doesn't require anything new of anybody in the impl.

Another option would be to have an IEmptyable interface and/or protocol to indicate explicit empty? check support. Probably overkill.

Assignee

Unassigned

Reporter

import

Labels

Approval

Vetted

Patch

Code and Test

Affects versions

Priority

Critical