raym - Thanks so much for the code snippet. That was just what I needed to
get unstuck and start playing around with the parser. I really appreciate
the help.
- Dave
On Sunday, October 21, 2012 6:10:51 AM UTC-5, raym wrote:
>
> As Dave says, you can do this using line-seq, but you'll have to
> accumulate some state as you read the lines so you can return all the
> lines for a given thread's ReqStart to ReqEnd. Once you've returned
> that block, you can delete the state for that thread-id, so your
> accumulated state will only contain the 'active' requests. If you're
> processing a very large file, you're best returning a lazy sequence of
> the data.
>
> Something like this should get you started:
>
> (require [clojure.java.io :as io])
>
> (defn parse-line
> [s]
> (let [[_ thread_id tag value] (re-find #"^(\d+)\s+(\S+)\s+(.+)$" s)]
> [thread_id tag value]))
>
> (defn parse-lines
> ([lines]
> (parse-lines lines {}))
> ([lines state]
> (lazy-seq
> (when (seq lines)
> (let [[thread-id tag value] (parse-line (first lines))
> state (assoc-in state [thread-id tag]
> (conj (get-in state [thread-id tag] [])
> value))]
> (if (= tag "ReqEnd")
> (cons (get state thread-id) (parse-lines (rest lines)
> (dissoc state thread-id)))
> (parse-lines (rest lines) state)))))))
>
> (defn parse-log-file
> [file]
> (with-open [logfile (io/reader file)]
> (doall
> (filter #(and (get % "ReqStart") (get % "ReqEnd"))
> (parse-lines (line-seq logfile))))))
>
>
> On 21 October 2012 02:54, Dave <[email protected] <javascript:>>
> wrote:
> > Clojurists - I'm fairly new to Clojure and didn't realize how broken
> I've
> > become using imperative languages all my life. I'm stumped as to how to
> > parse a Varnish (www.varnish-cache.org) log file using Clojure. The
> main
> > problem is that for a single request a varnish log file generates
> multiple
> > log lines and each line is interspersed with lines from other threads.
> > These log files can be several gigabytes in size (so using a stable sort
> of
> > the entire log by thread id is out of the question).
> >
> > Below I've included a small example log file and an example output
> Clojure
> > data structure. Let me thank everyone in advance for any hints / help
> they
> > can provide on this seemingly simple problem.
> >
> > Rules of the Varnish Log File
> >
> > The first number on each line is the thread id (not unique and gets
> reused
> > frequently)
> > Each ReqStart marks the start of a request and the last number on the
> line
> > is the unique transaction id (e.g. 118591777)
> > ReqEnd denote the end of the processing of the request by the thread
> > Each line is atomically written, however many threads generate log lines
> > that are interspersed with other requests (threads)
> > These log files can be VERY large (10+ Gigabytes in the case of my
> > application) so using a stable sort by thread id or anything that loads
> the
> > entire file into memory is out of the question.
> >
> >
> > Example Varnish Log file
> > 40 ReqEnd c 118591771 1350759605.775758028 1350759611.249602079
> > 5.866879225 5.473801851 0.000042200
> > 15 ReqStart c 10.102.41.121 4187 118591777
> > 15 RxRequest c GET
> > 15 RxURL c /json/engagement
> > 15 RxHeader c host: www.example.com
> > 30 ReqStart c 10.102.41.121 3906 118591802
> > 15 RxHeader c Accept: application/json
> > 30 RxRequest c GET
> > 30 RxURL c /ws/boxtops/user/
> > 30 RxHeader c host: www.example.com
> > 15 ReqEnd c 118591777 1350759605.775758028 1350759611.249602079
> > 5.866879225 5.473801851 0.000042200
> > 30 RxHeader c Accept: application/xml
> > 30 ReqEnd c 118591802 1350759611.326084614 1350759611.329720259
> > 0.005002737 0.003598213 0.000037432
> > 15 ReqStart c 10.102.41.121 4187 118591808
> > 15 RxRequest c GET
> > 15 RxURL c /ws/boxtops/user/
> > 30 ReqStart c 10.102.41.121 3906 118591810
> > 15 RxHeader c host: www.example.com
> > 15 RxHeader c Accept: application/xml
> > 30 RxRequest c GET
> > 30 RxURL c /registration/success
> > 30 RxHeader c host: www.example.com
> > 46 TxRequest - GET
> > 30 RxHeader c Accept: text/html
> > 46 TxURL - /registration/success
> > 15 ReqEnd c 118591808 1350759611.442447424 1350759611.444925785
> > 0.016906023 0.002441406 0.000036955
> > 30 ReqEnd c 118591810 1350759611.521781683 1350759611.525400877
> > 0.098322868 0.003532171 0.000087023
> >
> > Desired Output
> > {
> > 118591802
> > { :ReqStart ["10.102.41.121 3906 118591802"]
> > :RxRequest ["GET"]
> > :RxURL ["/ws/boxtops/user/"]
> > :RxHeader ["host: www.example.com" "Accept: application/xml"]
> > or better yet
> > :RxHeader {:host "www.example.com" :Accept "application/xml"}
> > :ReqEnd ["118591802 1350759611.326084614 1350759611.329720259
> > 0.005002737 0.003598213 0.000037432"] }
> > 118591777
> > { :ReqStart ["10.102.41.121 4187 118591777"]
> > :RxRequest ["GET"]
> > :RxURL ["/json/engagement"]
> > :RxHeader ["host: www.example.com" "Accept: application/json"]
> > :ReqEnd ["118591777 1350759605.775758028 1350759611.249602079
> > 5.866879225 5.473801851 0.000042200" ]}
> > 118591808
> > { :ReqStart [10.102.41.121 4187 118591808]
> > :RxRequest ["GET"]
> > :RxURL ["/ws/boxtops/user/"]
> > :RxHeader ["host: www.example.com" "Accept: application/xml"]
> > :ReqEnd ["118591808 1350759611.442447424 1350759611.444925785
> > 0.016906023 0.002441406 0.000036955"] }
> > 118591810
> > { :ReqStart ["10.102.41.121 3906 118591810"]
> > :RxRequest ["GET"]
> > :RxURL ["/registration/success"]
> > :RxHeader ["host: www.example.com" "Accept: text/html]
> > :ReqEnd ["118591810 1350759611.521781683 1350759611.525400877
> > 0.098322868 0.003532171 0.000087023"] }
> > }
> >
> > --
> > You received this message because you are subscribed to the Google
> > Groups "Clojure" group.
> > To post to this group, send email to [email protected]<javascript:>
> > Note that posts from new members are moderated - please be patient with
> your
> > first post.
> > To unsubscribe from this group, send email to
> > [email protected] <javascript:>
> > For more options, visit this group at
> > http://groups.google.com/group/clojure?hl=en
>
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to [email protected]
Note that posts from new members are moderated - please be patient with your
first post.
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en