I created a gist of your code for better readability, I hope you don't mind.
https://gist.github.com/muhuk/7c4a2b8db63886e2a9cd On Mon, May 5, 2014 at 12:36 PM, Andrew Chambers <[email protected]>wrote: > I've been trying to make a tokenizer/lexer for a project of mine and came > up with the following code, > I've modelled the stream of characters as seq/lazy of chars which is then > converted to a lazy-seq of token objects. > I'm relatively happy with how idiomatic and functional the code seems, > however when benchmarked, the code takes about 30 seconds on clojure (after > i increase the heap to 1 gig) > to process a 30 meg file, and over 1 minute 30 seconds with clojurescript. > This is in contrast to about of 0.1 to 0.5 seconds or less in C. Is > there any idiomatic way to process the file without being a factor of 100 > times slower than C? > > Also, is there a tool for clojure similar to gprof for C? > > > Each function takes in a char seq and returns both a token and the seq > after its been advanced. > > (defn match-ident > [cs] > (let [start (first cs)] > (if (ident-first-char? start) > (let [ identseq (cons start (take-while ident-tail-char? (rest cs))) > ^String ident (apply str identseq)] > [(drop (.length ident) cs) [:ident ident]])))) > > (defn match-num > [cs] > (if (digit? (first cs)) > (let [ numseq (take-while digit? cs) > ^String numstr (apply str numseq) > retseq (drop (.length numstr) cs)] > (if (= (first retseq) \.) > nil > [retseq [:number numstr]])))) > > (defn match-ws > [cs] > (if (whitespace-char? (first cs)) > (let [ wsseq (take-while whitespace-char? cs) > ^String wsstr (apply str wsseq) > retseq (drop (.length wsstr) cs)] > [retseq [:ws wsstr]]))) > > > ... > > (defn next-token > [cs] > (or (match-ident cs) > (match-ws cs) > (match-punct cs) > (match-num cs) > (match-eof cs) > (match-unknown cs))) > > ;; Here I build the lazy seq of tokens. > > (defn token-seq > [cs] > (let [[newcs tok] (next-token cs)] > (lazy-seq (cons tok (token-seq newcs))))) > > > Cheers, > Andrew Chambers > > > -- > You received this message because you are subscribed to the Google > Groups "Clojure" group. > To post to this group, send email to [email protected] > Note that posts from new members are moderated - please be patient with > your first post. > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/clojure?hl=en > --- > You received this message because you are subscribed to the Google Groups > "Clojure" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/d/optout. > -- Kind Regards, Atamert Ölçgen -+- --+ +++ www.muhuk.com -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to [email protected] Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
