Uploaded image for project: 'Clojure'
  1. CLJ-1499

Replace seq-based iterators with direct iterators for all non-seq collections that use SeqIterator

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Completed
    • Affects versions: Release 1.7
    • Fix versions: Release 1.7
    • Labels:
      None
    • Approval:
      Ok
    • Patch:
      Code and Test

      Description

      Motivation:

      Currently, the process of reducing over a map or set involves converting them to seqs. This action thus causes more intermediate objects than we'd like. Therefore, we'd like for reduce to be more efficient about the way that it reduces over these types. One way is for reduction to use the types' core collections while another way is to instead use a smart iterator type that abstracts the direct operation.

      Solution

      Add support for direct iterators instead of seq-based iterators for non-seq collections that use SeqIterator.

      Patch adds support for direct iterators on the following (removing use of SeqIterator):

      • PersistentHashMap - new internal iterator (~20% faster)
      • APersistentSet - use internal map impl iterator (~5% faster)
      • PersistentQueue
      • PersistentStructMap
      • records (in core_deftype.clj)

      Patch does not change use of SeqIterator in:

      • LazyTransformer$MultiStepper (not sure if this could be changed)
      • ASeq
      • LazySeq

      In preparation for use in CLJ-1602 Closed , there is also a new IMapIterable interface that allows a map to publish direct Iterators of keys and values (without creating and pulling apart MapEntry objs). PersistentHashMap and PersistentArrayMap implement it right now and PersistentHashSet will check for and use the direct key iterator if possible.

      Performance:

      I re-ran the set of tests from CLJ-1546 Closed for vec improvements which use these new iterators for PersistentArrayMap, PersistentHashMap, and PersistentHashSet. These test the time to run vec on a PHS or PHM of the given size. All timings in nanoseconds.

      coll size 1.7.0-alpha4 1.7.0-alpha5 alpha5 + patch alpha5 + patch / alpha5
      PHS 4 236 406 204 0.86
      PHS 16 857 896 382 0.45
      PHS 64 3637 3832 1723 0.47
      PHS 256 14235 14672 6525 0.46
      PHS 1024 67743 68857 44654 0.66
      PHM 4 112 232 232 2.07
      PHM 16 550 832 442 0.80
      PHM 64 3014 3155 2014 0.67
      PHM 256 11661 12082 7522 0.65
      PHM 1024 57388 59787 44240 0.77

      I also ran a set of tests just on reduce with reduce-kv for comparison (reduce-kv uses a different unaffected reduction path so it's a good baseline for comparison).

      expr a-set / a-map size 1.6.0 1.7.0-alpha5 (ns) 1.7.0-alpha5 + patch (ns) patch / alpha5 patch / 1.6
      (reduce + 0 a-set) 4 171 241 114 0.47 0.67
      (reduce + 0 a-set) 16 663 826 395 0.48 0.60
      (reduce + 0 a-set) 64 3637 4456 2361 0.53 0.65
      (reduce + 0 a-set) 256 14772 16158 8872 0.55 0.60
      (reduce + 0 a-set) 1024 72548 79010 49626 0.63 0.68
      (reduce + 0 a-set) 16384 1197962 1234286 729335 0.59 0.61
      (reduce #(+ %1 (val %2)) 0 a-map) 4 69 103 104 1.01 1.51
      (reduce #(+ %1 (val %2)) 0 a-map) 16 640 752 506 0.67 0.79
      (reduce #(+ %1 (val %2)) 0 a-map) 64 2861 3238 2388 0.74 0.83
      (reduce #(+ %1 (val %2)) 0 a-map) 256 11998 12848 9526 0.74 0.79
      (reduce #(+ %1 (val %2)) 0 a-map) 1024 59650 61913 54717 0.88 0.92
      (reduce #(+ %1 (val %2)) 0 a-map) 16384 982317 969168 837842 0.86 0.85
      (reducekv #(%1 %3) 0 a-map) 4 58 62 60 0.97 1.03
      (reducekv #(%1 %3) 0 a-map) 16 233 250 248 0.99 1.06
      (reducekv #(%1 %3) 0 a-map) 64 1477 1525 1442 0.95 0.98
      (reducekv #(%1 %3) 0 a-map) 256 5470 5258 5201 0.99 0.95
      (reducekv #(%1 %3) 0 a-map) 1024 27344 27442 27410 1.00 1.00
      (reducekv #(%1 %3) 0 a-map) 16384 399591 397090 396267 1.00 0.99

      Patch: clj-1499-v12.patch

      Screened by:

        Attachments

        1. clj-1499-all.diff
          15 kB
        2. clj-1499-v10.patch
          22 kB
        3. clj-1499-v11.patch
          21 kB
        4. clj-1499-v12.patch
          22 kB
        5. clj-1499-v2.diff
          10 kB
        6. clj-1499-v3.diff
          12 kB
        7. clj-1499-v6.diff
          16 kB
        8. clj-1499-v7.diff
          16 kB
        9. clj-1499-v8.diff
          21 kB
        10. clj-1499-v9.patch
          21 kB
        11. defrecord-iterator.patch
          4 kB
        12. defrecord-iterator-v2.diff
          3 kB

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              richhickey Rich Hickey
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: