Neat blog by the way.
On Monday, February 10, 2014 8:41:51 PM UTC-5, Rob Buhler wrote:
>
> Hi,
>
> I'm learning Clojure and I wrote a word-frequencies function that relies
> heavily on clojure.core/frequencies (plus a little filtering)
>
> (ns topwords.core
> (require [clojure.java.io :as io]
> [clojure.string :as str]))
> (def stop-words #{"other" "still" "again" "where" "could" "there"
>
> "their" "these" "those" "after" "while" "almost" "before"
> "through"
>
> "every" "being" "never" "should" "might" "thing" "among"
>
> "which" "would" "though" "about"})
> (defn get-words [line]
> (re-seq #"\p{Alpha}+" line))
> (defn min-length [word]
> (< 4 (count word)))
> (defn ignore-words [word]
> (if-not (contains? stop-words word) word))
> (defn word-frequencies [filename]
> (with-open [rdr (io/reader filename)]
> (let [lines (line-seq rdr)
> words (comp get-words str/lower-case)
> preds (every-pred min-length ignore-words)]
> (frequencies (filter preds (words lines))))))
>
>
> It works (you can see some output from it on my blog if you want -
> http://robbuhler.blogspot.com/2014/02/word-frequencies-from-file.html)
>
> Anyway, my questions are:
>
>
> 1) Why do I not need a doall on the line-seq? What is forcing the evaluation
> here?
>
>
> 2) I'm assuming this is still reading the entire file into memory at once? If
> so, how would I
>
> count the frequencies of a really large file without consuming so much
> memory?
>
> I've thought about using doseq and for each line updating a atom that
> holds a map,
>
> but I'm not sure if I'm no the right track here.
>
> I'm just thinking of something like this (in Python):
>
> for i in xrange(100):
>
> key = i % 10
>
> if key in d:
> d[key] += 1
> else:
> d[key] = 1
>
> Can I somehow count all of the frequencies line by line and not use an
> atom (or another ref type)?
>
> I'm not looking for the ultimate performance code, just something that
> would be considered idiomatic Clojure
>
>
> Thanks,
>
> Rob
>
>
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to [email protected]
Note that posts from new members are moderated - please be patient with your
first post.
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.