will do so. thank you.
>
>
> Chip
>
>
> From: Alexandre Rafalovitch
> Sent: Wednesday, September 19, 2018 2:05:41 PM
> To: solr-user
> Subject: Re: Seeking a simple way to test my index.
>
> Have you looked at Apache Nutch? Seems like the direct match for your
> - g
I do use Nutch as my crawler, but just as my crawler, so I hadn't thought to
look for an answer there. I will do so. thank you.
Chip
From: Alexandre Rafalovitch
Sent: Wednesday, September 19, 2018 2:05:41 PM
To: solr-user
Subject: Re: Seeking a simple w
Have you looked at Apache Nutch? Seems like the direct match for your
- growing - requirements and it does integrate with Solr. Or one of
the other solutions, like http://stormcrawler.net/
http://www.norconex.com/collectors/
Otherwise, this does not really feel like a Solr question.
Regards,
A
I've got a Solr instance which crawls roughly 3,500 seed pages, depth of 1, at
240 institutions, all but 1 of which I don't control. I recrawl once a month or
so. Naturally if one of the sites I crawl changes, then I need to know to
update my seed URLs. I've been checking this by hand, which was