We're updating the issue view to help you get more done. 

Replace seq-based iterators with direct iterators for all non-seq collections that use SeqIterator

Description

Motivation:

Currently, the process of reducing over a map or set involves converting them to seqs. This action thus causes more intermediate objects than we'd like. Therefore, we'd like for reduce to be more efficient about the way that it reduces over these types. One way is for reduction to use the types' core collections while another way is to instead use a smart iterator type that abstracts the direct operation.

Solution

Add support for direct iterators instead of seq-based iterators for non-seq collections that use SeqIterator.

Patch adds support for direct iterators on the following (removing use of SeqIterator):

  • PersistentHashMap - new internal iterator (~20% faster)

  • APersistentSet - use internal map impl iterator (~5% faster)

  • PersistentQueue

  • PersistentStructMap

  • records (in core_deftype.clj)

Patch does not change use of SeqIterator in:

  • LazyTransformer$MultiStepper (not sure if this could be changed)

  • ASeq

  • LazySeq

In preparation for use in CLJ-1602, there is also a new IMapIterable interface that allows a map to publish direct Iterators of keys and values (without creating and pulling apart MapEntry objs). PersistentHashMap and PersistentArrayMap implement it right now and PersistentHashSet will check for and use the direct key iterator if possible.

Performance:

I re-ran the set of tests from for vec improvements which use these new iterators for PersistentArrayMap, PersistentHashMap, and PersistentHashSet. These test the time to run vec on a PHS or PHM of the given size. All timings in nanoseconds.

coll

size

1.7.0-alpha4

1.7.0-alpha5

alpha5 + patch

alpha5 + patch / alpha5

PHS

4

236

406

204

0.86

PHS

16

857

896

382

0.45

PHS

64

3637

3832

1723

0.47

PHS

256

14235

14672

6525

0.46

PHS

1024

67743

68857

44654

0.66

PHM

4

112

232

232

2.07

PHM

16

550

832

442

0.80

PHM

64

3014

3155

2014

0.67

PHM

256

11661

12082

7522

0.65

PHM

1024

57388

59787

44240

0.77

I also ran a set of tests just on reduce with reduce-kv for comparison (reduce-kv uses a different unaffected reduction path so it's a good baseline for comparison).

expr

a-set / a-map size

1.6.0

1.7.0-alpha5 (ns)

1.7.0-alpha5 + patch (ns)

patch / alpha5

patch / 1.6

(reduce + 0 a-set)

4

171

241

114

0.47

0.67

(reduce + 0 a-set)

16

663

826

395

0.48

0.60

(reduce + 0 a-set)

64

3637

4456

2361

0.53

0.65

(reduce + 0 a-set)

256

14772

16158

8872

0.55

0.60

(reduce + 0 a-set)

1024

72548

79010

49626

0.63

0.68

(reduce + 0 a-set)

16384

1197962

1234286

729335

0.59

0.61

(reduce #(+ %1 (val %2)) 0 a-map)

4

69

103

104

1.01

1.51

(reduce #(+ %1 (val %2)) 0 a-map)

16

640

752

506

0.67

0.79

(reduce #(+ %1 (val %2)) 0 a-map)

64

2861

3238

2388

0.74

0.83

(reduce #(+ %1 (val %2)) 0 a-map)

256

11998

12848

9526

0.74

0.79

(reduce #(+ %1 (val %2)) 0 a-map)

1024

59650

61913

54717

0.88

0.92

(reduce #(+ %1 (val %2)) 0 a-map)

16384

982317

969168

837842

0.86

0.85

(reducekv #(%1 %3) 0 a-map)

4

58

62

60

0.97

1.03

(reducekv #(%1 %3) 0 a-map)

16

233

250

248

0.99

1.06

(reducekv #(%1 %3) 0 a-map)

64

1477

1525

1442

0.95

0.98

(reducekv #(%1 %3) 0 a-map)

256

5470

5258

5201

0.99

0.95

(reducekv #(%1 %3) 0 a-map)

1024

27344

27442

27410

1.00

1.00

(reducekv #(%1 %3) 0 a-map)

16384

399591

397090

396267

1.00

0.99

Patch: clj-1499-v12.patch

Screened by:

Environment

None

Status

Assignee

Unassigned

Reporter

Rich Hickey

Labels

None

Approval

Ok

Patch

Code and Test

Fix versions

Affects versions

Release 1.7

Priority

Major