Check out assoc-in, get-in, and update-in. They make working with
nested maps a breeze. Here's a rewrite of your code:
(ns user
(:require [clojure.string :as str]
[clojure.java.io :as io]))
(def postcode-trie
(with-open [r (io/reader "/path/to/data.csv")]
(reduce
(fn [trie line]
(let [[postcode region] (.split line ",")
postcode (str/replace postcode " " "")]
(assoc-in trie postcode region)))
{}
(line-seq r))))
(get-in postcode-trie "SW1A2")
;; => "20"
This stores keys of the tree (trie) as characters instead of strings,
which lets you use get-in easily.
Using line-seq might help mitigate unnecessary memory usage, though as
Andy mentions, Java objects just carry a lot of baggage.
Justin
On Nov 3, 12:22 pm, Paul Ingles <[email protected]> wrote:
> Hi,
>
> I've been playing around with breaking apart a list of postal codes to
> be stored in a tree with leaf nodes containing information about that
> area. I have some code that works with medium-ish size inputs but
> fails with a GC Overhead error with larger input sets (1.5m rows) and
> would really appreciate anyone being able to point me in the right
> direction.
>
> The full code is up as a gist here:https://gist.github.com/661278
>
> My input file contains something like:
>
> SW1A 1,10
> SW1A 2,20
> ...
>
> Which are then mapped to 2 trees:
>
> {"S" {"W" {"1" {"A" {"1" 10}}}}}
> {"S" {"W" {"1" {"A" {"2" 20}}}}}
>
> I then want to continually fold those trees into a master tree. For
> the 2 maps above the merged tree would be:
>
> {"S" {"W" {"1" {"A" {"1" 10 "2" 20}}}}}
>
> I'm sure I'm missing all kinds of awesome core/contrib functions to
> make it more concise and would appreciate anyone pointing out
> alternatives also.
>
> The main problem is that it fails when my input data gets sufficiently
> large. On my MacBook Pro it falls over with an input set of 1.5m
> records (although a lot of these would be branches from the first few
> levels). It reports GC Overhead limit exceeded, although also ran out
> of heap size before I upped that.
>
> I assume this is because during the tree reduction it's still
> retaining references to nodes eventually causing it to build
> continually larger structures?
>
> I've included the reduce function (and how that gets called to produce
> results) inline:
>
> (defn merge-tree
> [tree other]
> (if (not (map? other))
> tree
> (merge-with (fn [x y] (merge-tree x y))
> tree other)))
>
> (def results (reduce merge-tree
> {}
> (map record-to-tree
> postcodes-from-file)))
>
> All help much appreciated,
> Paul
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to [email protected]
Note that posts from new members are moderated - please be patient with your
first post.
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en