Improve chunked sequence processing performance

Description

Chunked sequential processing shows a general improvement when invoking reduce on ArrayChunk instead of the current dotimes. The functions affected are map, filter and keep. The following table shows the relevant benchmarks in ms. Numbers in [brackets] are worse than the original (although by a small margin). The best overall improvement is seen on long ranges.

long range

f

before(doall)

after(doall)

before(chunk-last)

after(chunk-last)

(map inc lr)

3.75

2.88

2.15

2.06

(keep identity lr)

2.56

2.16

0.75

0.72

(filter odd? lr)

2.77

2.20

1.53

1.45

range

f

before(doall)

after(doall)

before(chunk-last)

after(chunk-last)

(map inc lr)

3.64

[3.70]

2.32

[2.50]

(keep identity lr)

2.10

1.94

0.56

0.46

(filter odd? lr)

1.95

[1.99]

1.19

[1.66]

vector

f

before(doall)

after(doall)

before(chunk-last)

after(chunk-last)

(map inc lr)

3.81

3.68

2.44

2.15

(keep identity lr)

2.03

[2.16]

0.53

0.46

(filter odd? lr)

2.08

[2.82]

1.67

1.39

gvec

f

before(doall)

after(doall)

before(chunk-last)

after(chunk-last)

(map inc lr)

3.69

[3.83]

1.46

1.35

(keep identity lr)

2.86

2.82

2.44

2.52

(filter odd? lr)

2.95

2.70

2.08

2.07

All benchmarks executed with Criterium "bench" on a freshly started JVM discarding results with big outlier variance. The general bench template used has the form: (let [xs chunked-seq] (bench (doall (f xs)))) where:

  • "chunked-seq" is one of: (range 100000), (range 1e5), (vec (range 1e5)) or (into (vector-of :int) (range 1e5))

  • "doall" is either doall or chunk-last (see below for definition)

  • "f" is one of: (map inc xs), (filter odd? xs) or (keep identity xs).

Observations:

  • The more and larger the chunks the more the benefits. Custom (chunked) sequences with larger chunks (than 32) could additionally benefit from the changes.

  • The same change makes things worse with keep-indexed and map-indexed, so they have not being changed.

  • for macro is also chunk-aware, but it uses an explicit loop to handle :let, :when, :while cases that is difficult to separate from the chunk-buffer changes.

  • chunk-last is a chunk-aware function to access the last element by chunk. Compared to doall (that walks the sequence one by one), chunk-last is more efficient for both before code and after changes code. The function is: (defn chunk-last [xs] (when-let [xs (seq xs)] (if-let [cn (chunk-next xs)] (recur cn) (last xs))))

  • The initial ad-hoc pre-definition of dotimes before the definition of {map} can be removed from core. This is included in the patch.

Patch: clj-2346-3.patch

Prescreened by: Alex Miller

Environment

None

Activity

Show:
Alex Miller
May 15, 2018, 2:31 PM

Is there a reason to .reduce vs reduce?

Renzo Borgatti
May 15, 2018, 6:41 PM

Because it's the reduce on IChunk which is not exposed through normal coll-reduce.

Alex Miller
May 15, 2018, 10:31 PM

It could be if IChunk extended IReduceInit.

Alex Miller
December 21, 2020, 8:45 PM

Refreshed clj-2346-3.patch to apply to master, attribution retained.

Alex Miller
December 21, 2020, 9:01 PM

And making IChunk extend IReduceInit might be useful but doesn't help here b/c reduce has not yet been defined (we only have reduce1 at this point which doesn't know about IReduceInit).

Assignee

Unassigned

Reporter

Renzo Borgatti

Labels

Approval

Prescreened

Patch

Code

Affects versions

Priority

Major