Interesting. It seems to me like locals-clearing should take care of
this for you, by preparing a call to trampoline, then setting the
locals to nil, then calling trampoline. But you can solve this easily
enough yourself, in this particular case, by splitting the strings up
into chunks before you open any files (you can do this lazily so it's
not a head-holding issue). Then inside the loop body, you open up a
part file, write all the strings you planned to write, close the file,
and recur with the next chunk of strings. Such a chunking function
would look a bit like:

(defn chunk-strings [size strs]
  ((fn chunk [pending strs written]
     (lazy-seq
      (if (>= written size)
        (cons pending (chunk [], strs, 0))
        (when-let [ss (seq strs)]
          (let [s (first ss)
                len (count s)]
            (chunk (conj pending s)
                   (rest ss)
                   (+ len written)))))))
   [], strs, 0))

I'm sure it can be done more cleanly with reductions, adding up length
as you go, but I had trouble holding that in my head, so primitive
recursion won out.

On Nov 26, 5:59 pm, Gerrard McNulty <[email protected]> wrote:
> Hi,
>
> I've a head holding problem that I believe is a bug in clojure 1.3.  I
> wrote the following function to split a a lazy seq of strings across
> files of x size:
>
> (defn split-file
>   ([path strs size]
>      (trampoline split-file path (seq strs) size 0))
>   ([path strs size part]
>      (with-open [f (clojure.java.io/writer (str path "." part))]
>        (loop [written 0, ss strs]
>          (when ss
>            (if (>= written size)
>              #(split-file path ss size (inc part))
>              (let [s (first ss)]
>                (.write f s)
>                (recur (+ written (.length s)) (next ss)))))))))
>
> If I call the 3 arg version of the function:
> (split-file "foo" (repeat 100000000 "blah blah blah") 100000000)
>
> I see memory usage increases as I'm writing each file with the usual
> gc slow down, then memory usage goes back down again as I get to a new
> split file.
>
> Memory usage is fine if I call the 4 arg version (which only writes
> one part of the split file):
> (split-file "foo" (repeat 100000000 "blah blah blah") 100000000 0)
>
> I can also avoid the head holding problem by removing trampoline and
> recursively calling split-file directly, but then those recursive
> calls use up stack and don't close files until all calls complete

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to [email protected]
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Reply via email to