Improve chunked sequence processing performance

Description

Chunked sequential processing shows a general improvement when invoking reduce on ArrayChunk instead of the current dotimes. The functions affected are map, filter and keep. The following table shows the relevant benchmarks in ms. Numbers in [brackets] are worse than the original (although by a small margin). The best overall improvement is seen on long ranges.

long range

f

before(doall)

after(doall)

before(chunk-last)

after(chunk-last)

(map inc lr)

3.75

2.88

2.15

2.06

(keep identity lr)

2.56

2.16

0.75

0.72

(filter odd? lr)

2.77

2.20

1.53

1.45

range

f

before(doall)

after(doall)

before(chunk-last)

after(chunk-last)

(map inc lr)

3.64

[3.70]

2.32

[2.50]

(keep identity lr)

2.10

1.94

0.56

0.46

(filter odd? lr)

1.95

[1.99]

1.19

[1.66]

vector

f

before(doall)

after(doall)

before(chunk-last)

after(chunk-last)

(map inc lr)

3.81

3.68

2.44

2.15

(keep identity lr)

2.03

[2.16]

0.53

0.46

(filter odd? lr)

2.08

[2.82]

1.67

1.39

gvec

f

before(doall)

after(doall)

before(chunk-last)

after(chunk-last)

(map inc lr)

3.69

[3.83]

1.46

1.35

(keep identity lr)

2.86

2.82

2.44

2.52

(filter odd? lr)

2.95

2.70

2.08

2.07

All benchmarks executed with Criterium "bench" on a freshly started JVM discarding results with big outlier variance. The general bench template used has the form: (let [xs chunked-seq] (bench (doall (f xs)))) where:

  • "chunked-seq" is one of: (range 100000), (range 1e5), (vec (range 1e5)) or (into (vector-of :int) (range 1e5))

  • "doall" is either doall or chunk-last (see below for definition)

  • "f" is one of: (map inc xs), (filter odd? xs) or (keep identity xs).

Observations:

  • The more and larger the chunks the more the benefits. Custom (chunked) sequences with larger chunks (than 32) could additionally benefit from the changes.

  • The same change makes things worse with keep-indexed and map-indexed, so they have not being changed.

  • for macro is also chunk-aware, but it uses an explicit loop to handle :let, :when, :while cases that is difficult to separate from the chunk-buffer changes.

  • chunk-last is a chunk-aware function to access the last element by chunk. Compared to doall (that walks the sequence one by one), chunk-last is more efficient for both before code and after changes code. The function is: (defn chunk-last [xs] (when-let [xs (seq xs)] (if-let [cn (chunk-next xs)] (recur cn) (last xs))))

  • The initial ad-hoc pre-definition of dotimes before the definition of {map} can be removed from core. This is included in the patch.

Patch: clj-2346-3.patch

Prescreened by: Alex Miller

Environment

None

Attachments

3
  • 21 Dec 2020, 08:43 PM
  • 15 May 2018, 11:17 AM
  • 09 Apr 2018, 12:50 AM

Activity

Show:

Alex MillerDecember 21, 2020 at 9:01 PM

And making IChunk extend IReduceInit might be useful but doesn't help here b/c reduce has not yet been defined (we only have reduce1 at this point which doesn't know about IReduceInit).

Alex MillerDecember 21, 2020 at 8:45 PM

Refreshed clj-2346-3.patch to apply to master, attribution retained.

Alex MillerMay 15, 2018 at 10:31 PM

It could be if IChunk extended IReduceInit.

Renzo BorgattiMay 15, 2018 at 6:41 PM

Because it's the reduce on IChunk which is not exposed through normal coll-reduce.

Alex MillerMay 15, 2018 at 2:31 PM

Is there a reason to .reduce vs reduce?

Details

Assignee

Reporter

Approval

Prescreened

Patch

Code

Priority

Affects versions

Created April 9, 2018 at 12:40 AM
Updated December 22, 2020 at 5:24 PM