Re: Removing old documents

Lance Norskog Tue, 01 May 2012 14:57:43 -0700

Maybe this is the HTTP caching feature? Solr comes with HTTP caching
turned on by default and so when you do queries and changes your
browser does not fetch your changed documents.


On Tue, May 1, 2012 at 11:53 AM,  <[email protected]> wrote:
> Hello,
>
> I did bin/nutch solrclean crawl/crawldb http://127.0.0.1:8983/solr/
>
> without and with -noCommit  and restarted solr server
>
> Log  shows that 5 documents were removed but they are still in the search 
> results.
> Is this a bug or something is missing?
> I use nutch-1.4 and solr 3.5
>
> Thanks.
> Alex.
>
>
>
>
>
>
>
> -----Original Message-----
> From: Markus Jelsma <[email protected]>
> To: solr-user <[email protected]>
> Sent: Tue, May 1, 2012 7:41 am
> Subject: Re: Removing old documents
>
>
> Nutch 1.4 has a separate tool to remove 404 and redirects documents from your
> index based on your CrawlDB. Trunk's SolrIndexer can add and remove documents
> in one run based on segment data.
>
> On Tuesday 01 May 2012 16:31:47 Bai Shen wrote:
>> I'm running Nutch, so it's updating the documents, but I'm wanting to
>> remove ones that are no longer available.  So in that case, there's no
>> update possible.
>>
>> On Tue, May 1, 2012 at 8:47 AM, [email protected] <
>>
>> [email protected]> wrote:
>> > Not sure if there is an automatic way but we do it via a delete query and
>> > where possible we update doc under same id to avoid deletes.
>> >
>> > On 01/05/2012 13:43, "Bai Shen" <[email protected]> wrote:
>> > >What is the best method to remove old documents?  Things that no
>> > >generate 404 errors, etc.
>> > >
>> > >Is there an automatic method or do I have to do it manually?
>> > >
>> > >THanks.
>
> --
> Markus Jelsma - CTO - Openindex
>
>



-- 
Lance Norskog
[email protected]

Re: Removing old documents

Reply via email to