Re: Exercise: words frequency ranking

lpetit Fri, 26 Dec 2008 08:00:15 -0800

What would you think of this form of coding ?
- The rationale is to separate functions that deal with system
"boundaries" from "core algorithmic functions".
So you should at least have two functions : one that does not deal
with input/output formats : will only deal with clojure/java
constructs.
- Don't expose "too early" functions that are just here to simplify
the algorithm : there's already the possibility to use defn- , but
there's also the possibility to embed functions in the principal
function by using let and inner functions
- And I also tried to write the "core algorithmic function" as
"functional" as I can.
Do you think the functional version is more ore less "obfuscated" ?


Here would be the "core function" (taking a string as an input, and
outputting the sorted sequence of ["word" 2] vectors) :

(defn topwords [str]
  "Takes a string as an input, and returns a sequence of vectors of
pairs [word nb-of-word-occurences]"
  (let [words (let [ls (System/getProperty "line.separator")]
                #(.split % ls))
        freqs (partial reduce #(merge-with + %1 {%2 1}) {})
        sort (partial sort-by (comp - val))]
    (-> str words freqs sort)))


HTH,
--
Laurent

On Dec 26, 4:37 pm, lpetit <[email protected]> wrote:
> Instead of #(- (val %)), one could also use the compose function :
> (comp - val)
>
> My 0,02 EURO,
>
> --
> Laurent
>
> On Dec 25, 4:58 pm, Mibu <[email protected]> wrote:
>
> > My version:
>
> > (defn top-words [input-filename result-filename]
> >   (spit result-filename
> >         (apply str
> >                (map #(format "%s : %d\n" (first %) (second %))
> >                     (sort-by #(-(val %))
> >                              (reduce #(conj %1 { %2 (inc (%1 %2 0)) }) {}
> >                                      (map #(.toLowerCase %)
> >                                           (re-seq #"\w+"
> >                                                   (slurp 
> > input-filename)))))))))
>
> > Mibu
>
> > On Dec 25, 2:16 pm, Piotr 'Qertoip' Włodarek <[email protected]>
> > wrote:
>
> > > Given the input text file, the program should write to disk a ranking
> > > of words sorted by frequency, like:
>
> > >                  the : 52483
> > >                  and : 32558
> > >                   of : 23477
> > >                    a : 22486
> > >                   to : 21993
>
> > > My first implementation:
>
> > > (defn topwords [in-filepath, out-filepath]
> > >   (def words (.split (.toLowerCase (slurp in-filepath)) "\\s+"))
>
> > >   (spit out-filepath
> > >         (apply  str
> > >                 (concat
> > >                   (map (fn [pair] (format "%20s : %5d \r\n" (key pair)
> > > (val pair)))
> > >                        (sort-by #( -(val %) )
> > >                                 (reduce
> > >                                   (fn [counted-words word]
> > >                                       ( assoc counted-words
> > >                                               word
> > >                                               (inc (get counted-words
> > > word 0)) ))
> > >                                   {}
> > >                                   words)))
> > >                   ["\r\n"]))))
>
> > > Somehow I feel it's far from optimal. Could you please advise and
> > > improve? What is the best, idiomatic implementation of this simple
> > > problem?
>
> > > regards,
> > > Piotrek
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/clojure?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: Exercise: words frequency ranking

Reply via email to