Re: filtering or getting accurate crawling results

Erick Erickson Sat, 13 Nov 2010 04:27:55 -0800

I'm pretty sparse on my Nutch knowledge, you'd probably get
more knowledgable answers on the Nutch mailing list.


Best
Erick

On Fri, Nov 12, 2010 at 11:52 PM, Dennis Gearon <[email protected]>wrote:

> Actually, can Nutch be used for SCRAPING, not crawling?
>
> I don't just want the url, I want the data assigned to specific fields, no
> matter what site or format it is coming from.
>
> I've done scraping, but it had to be custom tailored for  each target.
>
>
>
>  Dennis Gearon
>
>
> Signature Warning
> ----------------
> It is always a good idea to learn from your own mistakes. It is usually a
> better
> idea to learn from others’ mistakes, so you do not have to make them
> yourself.
> from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
>
>
> EARTH has a Right To Life,
> otherwise we all die.
>
>
>
> ----- Original Message ----
> From: Dennis Gearon <[email protected]>
> To: [email protected]
> Sent: Fri, November 12, 2010 8:46:31 PM
> Subject: filtering or getting accurate crawling results
>
> How easy is it to get good results from the Lucene crawling software?
>
> Let's say for example I wanted only information about a general subject,
> but
> nothing else? (Sorry, not ready to say what exactly at this point) Is it
> like
> tuning Solr, or IS it tuning Solr to just not accept what does not fit the
> desire results?
>
> The amount of information that I'd want is LARGE, but a drop in the bucket
> compared to google itself.
>
>
>
> Dennis Gearon
>
>
> Signature Warning
> ----------------
> It is always a good idea to learn from your own mistakes. It is usually a
> better
>
> idea to learn from others’ mistakes, so you do not have to make them
> yourself.
> from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
>
>
> EARTH has a Right To Life,
> otherwise we all die.
>

Re: filtering or getting accurate crawling results

Reply via email to