Significantly improve compile time by reducing calls to Class.forName

Description

Compilation speed has been a real problem for a number of my projects, especially Aleph [1], which in 1.6 takes 18 seconds to load. Recently I realized that Class.forName is being called repeatedly on symbols which are lexically bound. Hits on Class.forName are cached, but misses are allowed to go through each time, which translates into tens of thousands of calls after calling `(use 'aleph.http)`.

Proposed: Avoid calling Class.forName() on non-namespaced symbols that do not contain "." or start with "[", don't map to a Class in the ns, and are names in the current local env. Also, adjust the ordering of checks in tagToClass to check for hints before checking for class.

[Note that the latest variant of the patch moves the check from the original patch farther down in the logic to avoid changing the semantics. This still yields virtually all of the performance gains. See comments for details.]

Patch: clj-1529-no-cache-2.diff

Screened by: Stu Halloway. Note that for this change the patch ended up being so small it is easier follow the code than the prose description.

[1] https://github.com/ztellman/aleph

Environment

None

Attachments

7

Activity

Show:

Alex MillerNovember 5, 2014 at 5:40 PM

Talked to Rich, focusing on no-cache patch. Added new version that fixes tabbing and restores Zach's name to the patch, which seems appropriate.

Alex MillerOctober 27, 2014 at 10:41 PM

I've done more testing and made mods to both patches and moved them closer together.

On the maybe-class-cache patch (new version = clj-1529-with-cache.diff):
1) I found that adding a final else branch that returned null was an improvement - this avoids caching things that will never hit in the future (Cons, PersistentList, Symbols with namespaces, etc). That's both a perf improvement (avoids hashing those things) and a memory improvement.
2) The tagToClass improvement from Zach's patch is orthogonal and also valid here so I added it.
3) I added Zach's check, but moved the placement lower so that it doesn't alter semantics. It helps avoid caching locals that aren't classes.

On the class-for-name patch (new version = clj-1529-no-cache.diff):
1) Same change as #3 above - moved check lower to avoid semantics change.

With these changes, both patches have tagToClass and local checks, neither patch changes semantics, and the only difference is whether to keep or remove the cache.

aleph timings (for "lein test"):

  • 1.7 master = 25.415 s

  • 1.7 + clj-1529-with-cache.diff = 14.329 s

  • 1.7 + clj-1529-no-cache.diff = 14.808 s

lamina timings (for "lein test"):

  • 1.7 master = 37.340 s

  • 1.7 + clj-1529-with-cache.diff = 28.680 s

  • 1.7 + clj-1529-no-cache.diff = 28.759 s

The cache helps slightly in both cases, but it does not seem worth adding the new dynamic var and unbounded memory use inherent in the cache.

Alex MillerOctober 24, 2014 at 9:01 PM

maybe-class-cache-2.patch removes some debugging stuff

importOctober 11, 2014 at 12:43 AM

Comment made by: deepakg

So repeating the experiment several times does in fact behave as you suspected...apologies for my previous LOLDATA.

importOctober 11, 2014 at 12:08 AM

Comment made by: deepakg

Hi Alex, the tests I ran did 20-iteration loops, and I took the mean (though it was pretty consistent between restarts). I can redo stuff and upload the raw data for you if that will help.

Completed

Details

Assignee

Reporter

Approval

Ok

Patch

Code

Priority

Affects versions

Fix versions

Created September 21, 2014 at 8:08 PM
Updated November 14, 2014 at 4:08 PM
Resolved November 14, 2014 at 4:08 PM