Nutch 1.4 has a separate tool to remove 404 and redirects documents from your 
index based on your CrawlDB. Trunk's SolrIndexer can add and remove documents 
in one run based on segment data.

On Tuesday 01 May 2012 16:31:47 Bai Shen wrote:
> I'm running Nutch, so it's updating the documents, but I'm wanting to
> remove ones that are no longer available.  So in that case, there's no
> update possible.
> 
> On Tue, May 1, 2012 at 8:47 AM, mav.p...@holidaylettings.co.uk <
> 
> mav.p...@holidaylettings.co.uk> wrote:
> > Not sure if there is an automatic way but we do it via a delete query and
> > where possible we update doc under same id to avoid deletes.
> > 
> > On 01/05/2012 13:43, "Bai Shen" <baishen.li...@gmail.com> wrote:
> > >What is the best method to remove old documents?  Things that no
> > >generate 404 errors, etc.
> > >
> > >Is there an automatic method or do I have to do it manually?
> > >
> > >THanks.

-- 
Markus Jelsma - CTO - Openindex

Reply via email to