I need to process large binary files, i.e. to remove ^M characters.
Let's assume files are about 50MB - small enough to be processed in
memory (but not with a naive implementation).
The following code works, except it throws OutOfMemoryError for file
as small as 6MB:
(defn read-bin-file [file]
(to-byte-array (as-file file)))
(defn remove-cr-from-file [file]
(let [dirty-bytes (read-bin-file file)
clean-bytes (filter #(not (= 13 %)) dirty-bytes)
changed? (< (count clean-bytes) (alength dirty-
bytes))] ; OutOfMemoryError
(if changed?
(write-bin-file file clean-bytes) ; writing works fine
nil)))
How to force 'filter' to be efficient, i.e. create another array
instead of a memory-blowing list?
How to approach processing large binary data in Clojure?
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to [email protected]
Note that posts from new members are moderated - please be patient with your
first post.
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en