Only skimmed your email, but purge every 4 hours jumped out at me. Would it make sense to have time-based indices that can be periodically dropped instead of being purged?
Otis Solr & ElasticSearch Support http://sematext.com/ On Oct 23, 2013 10:33 AM, "Scott Lundgren" <scott.lundg...@carbonblack.com> wrote: > *Background:* > > - Our use case is to use SOLR as a massive FIFO queue. > > - Document additions and updates happen continuously. > > - Documents are being added at sustained a rate of 50 - 100 documents > per second. > > - About 50% of these document are updates to existing docs, indexed > using atomic updates: the original doc is thus deleted and re-added. > > - There is a separate purge operation running every four hours that deletes > the oldest docs, if required based on a number of unrelated configuration > parameters. > > - At some time in the past, a manual force merge / optimize with > maxSegments=2 was run to troubleshoot high disk i/o and remove "too many > segments" as a potential variable. Currently, the largest fdts are 74G and > 43G. There are 47 total segments, the largest other sizes are all around > 2G. > > - Merge policies are all at Solr 4 defaults. Index size is currently ~50M > maxDocs, ~35M numDocs, 276GB. > > *Issue:* > > The background purge operation is deleting docs on schedule, but the disk > space is not being recovered. > > *Presumptions:* > I presume, but have not confirmed (how?) the 15M deleted documents are > predominately in the two large segments. Because they are largely in the > two large segments, and those large segments still have (some/many) live > documents, the segment backing files are not deleted. > > *Questions:* > > - When will those segments get merged and documents recovered? Does it > happen when _all_ the documents in those segments are deleted? Some > percentage of the segment is filled with deleted documents? > - Is there a way to do it right now vs. just waiting? > - In some cases, the purge delete conditional is _just_ free disk space: > when index > free space, delete oldest. Those setups are now in scenarios > where index >> free space, and getting worse. How does low disk space > effect above two questions? > - Is there a way for me to determine stats on a per-segment basis? > - for example, how many deleted documents in a particular segment? > - On the flip side, can I determine in what segment a particular document > is located? > > Thank you, > > Scott > > -- > Scott Lundgren > Director of Engineering > Carbon Black, Inc. > (210) 204-0483 | scott.lundg...@carbonblack.com >