Re: Usage of CloudSolrStream while new data is indexed

Erick Erickson Sun, 19 Jul 2015 14:12:41 -0700

The basic assumption is that any search has a "snapshot" of the index
via the currently open searcher. I expect the streaming code is no
different, indexing while reading from a stream should _not_ show
documents that weren't in the index (and visible) when they query was
submitted, regardless of how long it takes to stream all the results
out and regardless of how the index changes while that's happening.

As far as segments being merged when the stream is writing out, it
shouldn't matter. The segment files are read-only, the fact that
background merging will read the segment while docs in that segment
are being streamed out shouldn't matter. And if a background merge
happens, the merged segments won't be deleted until after the query is
complete, in this case until all documents that satisfy the search
(regardless of how many segments are spanned) are done.

HTH,
Erick

On Sun, Jul 19, 2015 at 11:38 AM, mihaela olteanu
<mihaela...@yahoo.com.invalid> wrote:
> Hello,
> I am iterating through a whole collection in SolrCloud using CloudSolrStream. 
> While I am doing this operation new data gets indexed into the collection. 
> Does CloudSolrStream pick up the newly added values? Is it negatively 
> impacted by this operation or what is the impact of the collection traversal 
> on indexing?
> I am not sure about how CloudSolrStream works. Does it simply read segment 
> files, so it might be slowed down while merging segments due to newly data 
> being indexed?
> Could someone explain me this process and what is the relation between these 
> two operations?
>
> Thank you in advance.Mihaela

Re: Usage of CloudSolrStream while new data is indexed

Reply via email to