*Background:*

- Our use case is to use SOLR as a massive FIFO queue.

- Document additions and updates happen continuously.

    - Documents are being added at sustained a rate of 50 - 100 documents
per second.

    - About 50% of these document are updates to existing docs, indexed
using atomic updates: the original doc is thus deleted and re-added.

- There is a separate purge operation running every four hours that deletes
the oldest docs, if required based on a number of unrelated configuration
parameters.

- At some time in the past, a manual force merge / optimize with
maxSegments=2 was run to troubleshoot high disk i/o and remove "too many
segments" as a potential variable.  Currently, the largest fdts are 74G and
43G.   There are 47 total segments, the largest other sizes are all around
2G.

- Merge policies are all at Solr 4 defaults. Index size is currently ~50M
maxDocs, ~35M numDocs, 276GB.

*Issue:*

The background purge operation is deleting docs on schedule, but the disk
space is not being recovered.

*Presumptions:*
I presume, but have not confirmed (how?) the 15M deleted documents are
predominately in the two large segments.  Because they are largely in the
two large segments, and those large segments still have (some/many) live
documents, the segment backing files are not deleted.

*Questions:*

- When will those segments get merged and documents recovered?  Does it
happen when _all_ the documents in those segments are deleted?  Some
percentage of the segment is filled with deleted documents?
- Is there a way to do it right now vs. just waiting?
- In some cases, the purge delete conditional is _just_ free disk space:
 when index > free space, delete oldest.  Those setups are now in scenarios
where index >> free space, and getting worse.  How does low disk space
effect above two questions?
- Is there a way for me to determine stats on a per-segment basis?
   - for example, how many deleted documents in a particular segment?
- On the flip side, can I determine in what segment a particular document
is located?

Thank you,

Scott

-- 
Scott Lundgren
Director of Engineering
Carbon Black, Inc.
(210) 204-0483 | scott.lundg...@carbonblack.com

Reply via email to