Re: Multiple index.timestamp directories using up disk space

Rishi Easwaran Tue, 05 May 2015 12:20:41 -0700

Hi Shawn, 

Thanks for clarifying lucene segment behaviour. We don't trigger optimize 
externally, could it be internal solr optimize? Is there a setting/ knob to 
control when optimize occurs.

Thanks for pointing it out, will monitor memory closely. Though doubt memory is 
an issue, these are top tier machines with 144GB RAM supporting 12x4GB JVM's. 
Out of which 9 JVM's are running in cloud mode writing to SSD, should be enough 
memory leftover for OS cache.

 The behaviour we see multiple huge directories for the same core. Till we 
figure out what's going on, the only option we are left with it is to clean up 
the entire index to free up disk space, and allow a replica to sync from 
scratch.

Thanks,
Rishi.  

-----Original Message-----
From: Shawn Heisey <apa...@elyograg.org>
To: solr-user <solr-user@lucene.apache.org>
Sent: Tue, May 5, 2015 10:55 am
Subject: Re: Multiple index.timestamp directories using up disk space

On 5/5/2015 7:29 AM, Rishi Easwaran wrote:
> Worried about data loss makes
sense. If I get the way solr behaves, the new directory should only have
missing/changed segments. 
> I guess since our application is extremely write
heavy, with lot of inserts and deletes, almost every segment is touched even
during a short window, so it appears like for our deployment every segment is
copied over when replicas get out of sync.

Once a segment is written, it is
*NEVER* updated again.  This aspect of
Lucene indexes makes Solr replication
more efficient.  The ids of
deleted documents are written to separate files
specifically for
tracking deletes.  Those files are typically quite small
compared to the
index segments.  Any new documents are inserted into new
segments.

When older segments are merged, the information in all of those
segments
is copied to a single new segment (minus documents marked as
deleted),
and then the old segments are erased.  Optimizing replaces the
entire
index, and each replica of the index would be considered different,
so
an index recovery that happens after optimization might copy the
whole
thing.

If you are seeing a lot of index recoveries during normal
operation,
chances are that your Solr servers do not have enough resources, and
the
resource that has the most impact on performance is memory.  The amount
of
memory required for good Solr performance is higher than most people
expect. 
It's a normal expectation that programs require memory to run,
but Solr has an
additional memory requirement that often surprises them
-- the need for a
significant OS disk
cache:

http://wiki.apache.org/solr/SolrPerformanceProblems

Thanks,
Shawn

Re: Multiple index.timestamp directories using up disk space

Reply via email to