On 5/5/2015 7:29 AM, Rishi Easwaran wrote:
> Worried about data loss makes sense. If I get the way solr behaves, the new 
> directory should only have missing/changed segments. 
> I guess since our application is extremely write heavy, with lot of inserts 
> and deletes, almost every segment is touched even during a short window, so 
> it appears like for our deployment every segment is copied over when replicas 
> get out of sync.

Once a segment is written, it is *NEVER* updated again.  This aspect of
Lucene indexes makes Solr replication more efficient.  The ids of
deleted documents are written to separate files specifically for
tracking deletes.  Those files are typically quite small compared to the
index segments.  Any new documents are inserted into new segments.

When older segments are merged, the information in all of those segments
is copied to a single new segment (minus documents marked as deleted),
and then the old segments are erased.  Optimizing replaces the entire
index, and each replica of the index would be considered different, so
an index recovery that happens after optimization might copy the whole
thing.

If you are seeing a lot of index recoveries during normal operation,
chances are that your Solr servers do not have enough resources, and the
resource that has the most impact on performance is memory.  The amount
of memory required for good Solr performance is higher than most people
expect.  It's a normal expectation that programs require memory to run,
but Solr has an additional memory requirement that often surprises them
-- the need for a significant OS disk cache:

http://wiki.apache.org/solr/SolrPerformanceProblems

Thanks,
Shawn

Reply via email to