clojure.data.csv memory usage can be reduced ~4x for retained/in-memory sequences

Description

After answering some questions about memory usage in data.csv, I realized that a naive string pooling could be applied fairly trivially (which I have done in other libraries). Without string pooling, parsing and retaining decent files in memory is infeasible. E.g. on a ~268mb test file, with 4gb of heap, the default data.csv operation bombs out after it blows the heap. This is surprising to some users, given comparative csv parsing facilities in other languages.

I ported a little concurrent-map based resizable string pool that pools string references to the existing data.csv namespace. Minor changes allow for optional sizing of the pool and setting upper bounds for determining when to flush references. Proposed patch is attached.

Environment

None

Assignee

Unassigned

Reporter

Alex Miller

Labels

None

Approval

None

Patch

None

Priority

Major
Configure