The basic assumption is that any search has a "snapshot" of the index via the currently open searcher. I expect the streaming code is no different, indexing while reading from a stream should _not_ show documents that weren't in the index (and visible) when they query was submitted, regardless of how long it takes to stream all the results out and regardless of how the index changes while that's happening.
As far as segments being merged when the stream is writing out, it shouldn't matter. The segment files are read-only, the fact that background merging will read the segment while docs in that segment are being streamed out shouldn't matter. And if a background merge happens, the merged segments won't be deleted until after the query is complete, in this case until all documents that satisfy the search (regardless of how many segments are spanned) are done. HTH, Erick On Sun, Jul 19, 2015 at 11:38 AM, mihaela olteanu <mihaela...@yahoo.com.invalid> wrote: > Hello, > I am iterating through a whole collection in SolrCloud using CloudSolrStream. > While I am doing this operation new data gets indexed into the collection. > Does CloudSolrStream pick up the newly added values? Is it negatively > impacted by this operation or what is the impact of the collection traversal > on indexing? > I am not sure about how CloudSolrStream works. Does it simply read segment > files, so it might be slowed down while merging segments due to newly data > being indexed? > Could someone explain me this process and what is the relation between these > two operations? > > Thank you in advance.Mihaela