Only skimmed your email, but purge every 4 hours jumped out at me. Would it
make sense to have time-based indices that can be periodically dropped
instead of being purged?

Otis
Solr & ElasticSearch Support
http://sematext.com/
On Oct 23, 2013 10:33 AM, "Scott Lundgren" <scott.lundg...@carbonblack.com>
wrote:

> *Background:*
>
> - Our use case is to use SOLR as a massive FIFO queue.
>
> - Document additions and updates happen continuously.
>
>     - Documents are being added at sustained a rate of 50 - 100 documents
> per second.
>
>     - About 50% of these document are updates to existing docs, indexed
> using atomic updates: the original doc is thus deleted and re-added.
>
> - There is a separate purge operation running every four hours that deletes
> the oldest docs, if required based on a number of unrelated configuration
> parameters.
>
> - At some time in the past, a manual force merge / optimize with
> maxSegments=2 was run to troubleshoot high disk i/o and remove "too many
> segments" as a potential variable.  Currently, the largest fdts are 74G and
> 43G.   There are 47 total segments, the largest other sizes are all around
> 2G.
>
> - Merge policies are all at Solr 4 defaults. Index size is currently ~50M
> maxDocs, ~35M numDocs, 276GB.
>
> *Issue:*
>
> The background purge operation is deleting docs on schedule, but the disk
> space is not being recovered.
>
> *Presumptions:*
> I presume, but have not confirmed (how?) the 15M deleted documents are
> predominately in the two large segments.  Because they are largely in the
> two large segments, and those large segments still have (some/many) live
> documents, the segment backing files are not deleted.
>
> *Questions:*
>
> - When will those segments get merged and documents recovered?  Does it
> happen when _all_ the documents in those segments are deleted?  Some
> percentage of the segment is filled with deleted documents?
> - Is there a way to do it right now vs. just waiting?
> - In some cases, the purge delete conditional is _just_ free disk space:
>  when index > free space, delete oldest.  Those setups are now in scenarios
> where index >> free space, and getting worse.  How does low disk space
> effect above two questions?
> - Is there a way for me to determine stats on a per-segment basis?
>    - for example, how many deleted documents in a particular segment?
> - On the flip side, can I determine in what segment a particular document
> is located?
>
> Thank you,
>
> Scott
>
> --
> Scott Lundgren
> Director of Engineering
> Carbon Black, Inc.
> (210) 204-0483 | scott.lundg...@carbonblack.com
>

Reply via email to