Me too, starting in October. I still need to get up to speed with Clojure however.
On Sun, Jun 5, 2011 at 11:04 PM, Andreas Kostler < [email protected]> wrote: > There's a Java library called HtmlCleaner. You might wanna give that a > shot. > Btw, I'm working on quite a similar project so if you like email me and we > can maybe join forces. > Andreas > > On 06/06/2011, at 11:01 AM, Base wrote: > > > hi all, > > > > I am working on an app that will parse web pages to do some NLP and > > statistics. I am able to parse the HTML using several different tool > > ( enlive, HTML parser, etc). However I would like to discard all the > > rest of the junk in the web page that is not pertinent (I.e. Ads). > > Does anyone have any experience doing this? Any tips On how to do > > this - or even better, tools that you can recommend? I have been > > digging around on this for a while now and am stuck! > > > > Thanks! > > > > Base > > > > -- > > You received this message because you are subscribed to the Google > > Groups "Clojure" group. > > To post to this group, send email to [email protected] > > Note that posts from new members are moderated - please be patient with > your first post. > > To unsubscribe from this group, send email to > > [email protected] > > For more options, visit this group at > > http://groups.google.com/group/clojure?hl=en > > -- > You received this message because you are subscribed to the Google > Groups "Clojure" group. > To post to this group, send email to [email protected] > Note that posts from new members are moderated - please be patient with > your first post. > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/clojure?hl=en > -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to [email protected] Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/clojure?hl=en
