You can simplify things a bit by indexing a "batch number" guaranteed
to be different between two runs for the same keyField. In fact I'd
make sure it was unique amongst all my runs. Simplest is a timestamp
(assuming you don't start two batches within a millisecond!). So it
looks like this.

get a new timestamp
Add it to _every_ doc in my current run.
issue delete-by-query like 'q=keyfield:A AND timestamp:[* TO timestamp}
commit

As Shawn says, you have to very carefully control the commits. And
also note that the curly brace at the end is NOT a typo, it excludes
the endpoint.

Best,
Erick

On Fri, Mar 27, 2015 at 7:01 AM, Russell Taylor
<russell.tay...@interactivedata.com> wrote:
> Yes that works and now I have a better understanding of the soft and hard 
> commits to boot.
>
> Thanks again Shawn.
>
>
> Russ.
>
> -----Original Message-----
> From: Shawn Heisey [mailto:apa...@elyograg.org]
> Sent: 27 March 2015 13:22
> To: solr-user@lucene.apache.org
> Subject: Re: Replacing a group of documents (Delete/Insert) without a query 
> on the index ever showing an empty list (Docs)
>
> On 3/27/2015 7:07 AM, Russell Taylor wrote:
>> Hi Shawn, thanks for the quick reply.
>>
>> I've looked at both methods and I think that they won't work for a number of 
>> reasons:
>>
>> 1)
>> uniqueKey:
>>  I could use the uniqueKey and overwrite the original document but I
>> need to remove the documents which are not on my new input list and the 
>> issue with the uniqueKey method is I don't know what to delete.
>>
>> Documents on the index:
>> "docs": [
>> {
>> "id":"1"
>> "keyField":"A"
>> },{
>> "id":"2"
>> "keyField":"A"
>> },{
>> "id":"3"
>> "keyField":"B"
>> }
>> ]
>> New Documents to go on index
>> "docs": [
>> {
>> "id":"1"
>> "keyField":"A"
>> },{
>> "id":"3"
>> "keyField":"B"
>> }
>> ]
>> I would never know that id:2 should be deleted. (on some new document lists 
>> the delete list could be in the millions).
>>
>> 2)
>> openSearcher:
>> My openSearcher is set to false and I've also commented out autoSoftCommit 
>> so I don't get a partial list being returned on a query.
>> <!--
>> <autoSoftCommit>
>>        <maxTime>${solr.autoSoftCommit.maxTime:1000}</maxTime>
>> </autoSoftCommit>
>> -->
>>
>>
>> So is there another way to keep the original set of documents until the new 
>> set has been added to the index?
>
> If you are 100% in control of when commits with openSearcher=true are sent, 
> which it sounds like you probably are, then you can do anything you want from 
> the start of indexing until commit time, and the user will never see any of 
> it, until the commit happens.  That allows the following relatively simple 
> paradigm:
>
> 1) Delete LOTS of stuff, or perhaps everything in the index with a 
> deleteByQuery of *:* (for all documents).
>
> 2) Index everything you need to index.
>
> 3) Commit.
>
> Thanks,
> Shawn
>
>
>
> *******************************************************
> This message (including any files transmitted with it) may contain 
> confidential and/or proprietary information, is the property of Interactive 
> Data Corporation and/or its subsidiaries, and is directed only to the 
> addressee(s). If you are not the designated recipient or have reason to 
> believe you received this message in error, please delete this message from 
> your system and notify the sender immediately. An unintended recipient's 
> disclosure, copying, distribution, or use of this message or any attachments 
> is prohibited and may be unlawful.
> *******************************************************

Reply via email to