Make sorted maps and sets implement j.u.NavigableMap and NavigableSet interfaces


Since Clojure 1.5 will probably no longer target JVM 5, add support for the (Concurrent)?Navigable(Map|Set) interfaces. This should involve adding functions to PersistentTreeMap that descend the red-black tree to select the closest items.




Michał Marczyk
February 19, 2015, 3:52 AM

Just wanted to note that data.avl offers this capability, although with a Clojure interface. (Implementing descendingMap would be straightforward, but it would likely complicate the implementation of some features which I think may be more compelling. I can go into more detail if desired, though perhaps it'd be best to discuss that on data.avl's tracker.)

Implementing the entirety of NavigableMap in PersistentTreeMap would be trickier than it might seem, since presumably we'd want to maintain the property of count being O(1) on the resulting structure. (It's possible to do away with this requirement, of course, but see below.)

data.avl collections store rank information in nodes; this enables rank queries and the navigable map operations with O(1) count on their return values, but it comes at a cost in node size. I think it's useful to have the compact PTM and the more feature-packed, but node-heavy data.avl available for different use cases, particularly since data.avl is very fast for basic operations (as in, completely on a par with PTM for access to a node through a path of the same shape; some individual tree operations may be a little slower, but then AVL trees tend to be shallower).

Andy Fingerhut
February 19, 2015, 6:04 PM

I haven't checked in detail, but it seems like it may be true that NavigableMap and NavigableSet interfaces could be implemented using the existing clojure.core/subseq and clojure.core/rsubseq functions (plus first, seq, etc.)

Michał Marczyk
February 20, 2015, 6:43 AM

subseq and rsubseq + first solve the "neightbour lookup" part of NavigableMap, but not the submap methods.

Subsetting red-black trees in logarithmic time is entirely possible, but, as mentioned in my previous comment, not if you want to maintain the property that count on the resulting structure is, if not O(1), then at least significantly better than O(n). This is because there is no way of knowing how many nodes are contained in any subrange of a BST without actually walking it or storing annotations in (at least some of) the nodes ahead of time.

I have described a possible workaround in my Conj talk last November and I have a sketch lying around somewhere, but it's not going to offer the type of convenience that data.avl does – the limitation described above is fundamental; it could be workable for use cases where count is (almost) never called.

As a side note, data.avl uses an implementation of neighbour lookup based on subseq et al. in its test suite, as a sanity check on the direct implementation used by There's also an implementation of subrange (the universal subsetting function for data.avl collections) based on subseq in there; as you'd expect, it's extremely slow for all but the smallest output sizes.




Jim Blomo







Affects versions