Me too, starting in October. I still need to get up to speed with Clojure
however.

On Sun, Jun 5, 2011 at 11:04 PM, Andreas Kostler <
[email protected]> wrote:

> There's a Java library called HtmlCleaner. You might wanna give that a
> shot.
> Btw, I'm working on quite a similar project so if you like email me and we
> can maybe join forces.
> Andreas
>
> On 06/06/2011, at 11:01 AM, Base wrote:
>
> > hi all,
> >
> > I am working on an app that will parse web pages to do some NLP and
> > statistics.  I am able to parse the HTML using several different tool
> > ( enlive, HTML parser, etc).  However I would like to discard all the
> > rest of the junk in the web page that is not pertinent (I.e. Ads).
> > Does anyone have any experience doing this?  Any tips On how to do
> > this - or even better, tools that you can recommend?   I have been
> > digging around on this for a while now and am stuck!
> >
> > Thanks!
> >
> > Base
> >
> > --
> > You received this message because you are subscribed to the Google
> > Groups "Clojure" group.
> > To post to this group, send email to [email protected]
> > Note that posts from new members are moderated - please be patient with
> your first post.
> > To unsubscribe from this group, send email to
> > [email protected]
> > For more options, visit this group at
> > http://groups.google.com/group/clojure?hl=en
>
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to [email protected]
> Note that posts from new members are moderated - please be patient with
> your first post.
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to [email protected]
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Reply via email to