Potential speed improvements for key lookup of large collections as array-map keys

Description

Demonstration of performance difference of looking up one of these
types of large collections in array-maps vs. hash-maps:

  • set

  • map

  • PersistentQueue

  • primitive vector created via vector-of

hash-maps lookup of these types takes advantage of an identical? (Java ==) fast path for key equality check, but for array-maps they do not.

There is already such an identical fast path in the equiv check for APersistentVector, so there is nothing to improve there in this regard.

 

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 (def set10k (set (range 10000))) (def map10k (into {} (zipmap (range 0 20000 2) (range 1 20001 2)))) (def q10k (into clojure.lang.PersistentQueue/EMPTY (range 10000))) (def pv10k (apply vector-of :long (range 10000))) ;; use them as keys in an array-map, and also a hash-map. ;; (= q10k pv10k), so they cannot be used as keys in the same map. (def hm1 (hash-map set10k :set10k map10k :map10k q10k :q10k)) (def hm2 (hash-map set10k :set10k map10k :map10k pv10k :pv10k)) (def am1 (array-map set10k :set10k map10k :map10k q10k :q10k)) (def am2 (array-map set10k :set10k map10k :map10k pv10k :pv10k)) (time (frequencies (map (fn [i] (hm1 set10k)) (range 1000)))) (time (frequencies (map (fn [i] (hm1 map10k)) (range 1000)))) (time (frequencies (map (fn [i] (hm1 q10k)) (range 1000)))) ;; time varies for the above, but on my system always less than 2 msec (time (frequencies (map (fn [i] (hm2 pv10k)) (range 1000)))) ;; The previous one above was slow, e.g. 495 to 567 msec range. Why? ;; Because hasheq method has no caching. Separate ticket for that ;; enhancement. (time (frequencies (map (fn [i] (am1 set10k)) (range 1000)))) ;; time range [733, 773] msec (time (frequencies (map (fn [i] (am1 map10k)) (range 1000)))) ;; time range [1281, 1350] msec (time (frequencies (map (fn [i] (am1 q10k)) (range 1000)))) ;; time range [294, 314] msec (time (frequencies (map (fn [i] (am2 pv10k)) (range 1000)))) ;; time range [273, 378] msec

The difference between looking up these values in a hash-map vs. array-map is that hash-map uses the clojure.lang.Util.equiv(Object, Object) method for key equality checks, which has a fast path check for identical key objects, whereas array-map uses clojure.lang.Util.EquivPred(Object) to determine a method to use for equiv checking while iterating through the array-map keys, and it selects clojure.lang.Util.equivColl(Object), which ends up calling one of:

+ clojure.lang.APersistentMap.equiv(Object) for maps

+ clojure.lang.APersistentSet.equiv(Object) for sets

+ clojure.lang.PersistentQueue.equiv(Object) for PersistentQueue

+ clojure.core.Vec.equiv(Object) for Vec

None of those methods have a fast path check for identical objects.

I realize there are method size limits on what the JVM JITs will in-line, and that is a bigger performance factor in many cases than the use cases that adding an identical fast path check might bring on average to Clojure applications. I can check, but at least for clojure.lang.APersistentMap.equiv(Object), it seems likely that it is already over the default limit for maximum byte code size of an inline-able method.

Environment

None

Status

Assignee

Unassigned

Reporter

Andy Fingerhut

Labels

None

Approval

None

Patch

None

Affects versions

Release 1.10

Priority

Minor
Configure