Move LazyTransformer to an iterator strategy, extend eduction capabilities

Description

  • LazyTransformer does a lot of work to be a seq. Instead, switch to creating a transforming iterator.

  • Change sequence to wrap iterator-seq around the transforming iterator.

  • Change the iterator-seq implementation to be chunked. IteratorSeq will no longer be used but is left in case of regressions for now.

  • Change Eduction to provide iteration directly via the transforming iterator.

  • Extend eduction to support multiple xforms.

Performance:

Compared:

  • alpha5 == before any change

  • alpha6 == after clj-1669-6.patch was applied

  • beta3 == latest, includes range enhancement, expanding mapcat enhancement, etc

;; using java 1.8
(use 'criterium.core)
(def s (range 1000))
(def v (vec s))
(def s50 (range 50))
(def v50 (vec s50))

expr

alpha5 s

alpha5 v

alpha6 s

alpha6 v

beta3 s

beta3 v

non-chunking transform

(into [] (->> s (interpose 5) (partition-all 2)))

432 us

437 us

413 us

411 us

353 us

414 us

(into [] (->> s (eduction (interpose 5) (partition-all 2)))) *

117 us

118 us

117 us

113 us

116 us

113 us

1 chunking transform

(into [] (map inc s))

43 us

45 us

35 us

31 us

32 us

36 us

(into [] (map inc) s)

19 us

19 us

18 us

18 us

18 us

16 us

(into [] (sequence (map inc) s))

100 us

54 us

97 us

65 us

66 us

64 us

(into [] (eduction (map inc) s))

24 us

19 us

24 us

20 us

20 us

19 us

(doall (map inc (eduction (map inc) s)))

219 us

203 us

98 us

78 us

93 us

89 us

2 chunking transforms

(into [] (map inc (map inc s)))

53 us

56 us

53 us

54 us

61 us

58 us

(into [] (comp (map inc) (map inc)) s)

13 us

26 us

30 us

26 us

31 us

31 us

(into [] (sequence (comp (map inc) (map inc)) s))

111 us

64 us

98 us

73 us

83 us

80 us

(into [] (eduction (map inc) (map inc) s)) *

58 us

31 us

58 us

31 us

30 us

31 us

(doall (map inc (eduction (map inc) (map inc) s))) *

240 us

212 us

114 us

93 us

105 us

102 us

expand transform

(into [] (mapcat range (map inc s50)))

74 us

73 us

67 us

68 us

37 us

39 us

(into [] (sequence (comp (map inc) (mapcat range)) s50))

111 us

102 us

166 us

159 us

99 us

98 us

(into [] (eduction (map inc) (mapcat range) s50)) *

65 us

64 us

57 us

56 us

27 us

27 us

materialized eduction

(sort (eduction (map inc) s))

ERR

ERR

99 us

77 us

77 us

77 us

(->> s (filter odd?) (map str) (sort-by last))

1.10 ms

1.25 ms

1.15 ms

1.19 ms

1.14 ms

1.13 ms

(->> s (eduction (filter odd?) (map str)) (sort-by last))

ERR

ERR

1.18 ms

1.15 ms

1.13 ms

1.13 ms

  • used comp to combine xforms as eduction only took one in the before case

Patch: clj-1669-6.patch

Screened by:

Environment

None

Activity

Show:
Alex Miller
May 18, 2015, 3:47 PM

updated for beta3 numbers

Alex Miller
March 27, 2015, 4:15 PM

clj-1669-6 is identical to clj-1669-5 but removes two commented out debugging lines that were inadvertently included.

Rich Hickey
March 27, 2015, 3:49 PM

push as is but leave unresolved, for perf tweaks

Alex Miller
March 21, 2015, 2:45 PM

The -5 patch is same -3 except all uses of IteratorSeq have been replaced with a ChunkedCons that is effectively a chunked version of the old IteratorSeq. While no one calls it, I left IteratorSeq in the code base in case of regression.

Generally, the chunked iterator seq reduces the cost in a number of the worst cases and also is a clear benefit in making seqs over a result of eduction or sequence faster to traverse (as they are now chunked).

I think the one potential issue is that seqs over iterators are now chunked when they were not before which could change programs that expect their stateful iterator to be traversed one at a time. This change could be isolated to just to sequence and seq-iterator and mitigated by not changing RT.seqFrom() and seq-iterator to use the new chunking behavior only in sequence and/or with a new chunked-iterator-seq to make it more explicit. The sequence over xf is new so no possible regression there, everything else would just be opt-in.

Alex Miller
March 20, 2015, 4:08 PM

Look at perf for:

  • ->> eduction transformation

  • transformation comparison that doesn't support chunking

  • more into vector iteration case

Completed
Your pinned fields
Click on the next to a field label to start pinning.

Assignee

Alex Miller

Reporter

Alex Miller

Labels

Approval

Ok

Patch

Code and Test

Priority

Major

Affects versions

Fix versions