Otis, Thank you for your response,
Could you elaborate a bit more on what you have in mind when you say "time-based" indices? Gun --- Senior Software Engineer Carbon Black, Inc. gun.ak...@carbonblack.com On Thu, Oct 24, 2013 at 11:56 PM, Otis Gospodnetic < otis.gospodne...@gmail.com> wrote: > Only skimmed your email, but purge every 4 hours jumped out at me. Would it > make sense to have time-based indices that can be periodically dropped > instead of being purged? > > Otis > Solr & ElasticSearch Support > http://sematext.com/ > On Oct 23, 2013 10:33 AM, "Scott Lundgren" <scott.lundg...@carbonblack.com > > > wrote: > > > *Background:* > > > > - Our use case is to use SOLR as a massive FIFO queue. > > > > - Document additions and updates happen continuously. > > > > - Documents are being added at sustained a rate of 50 - 100 documents > > per second. > > > > - About 50% of these document are updates to existing docs, indexed > > using atomic updates: the original doc is thus deleted and re-added. > > > > - There is a separate purge operation running every four hours that > deletes > > the oldest docs, if required based on a number of unrelated configuration > > parameters. > > > > - At some time in the past, a manual force merge / optimize with > > maxSegments=2 was run to troubleshoot high disk i/o and remove "too many > > segments" as a potential variable. Currently, the largest fdts are 74G > and > > 43G. There are 47 total segments, the largest other sizes are all > around > > 2G. > > > > - Merge policies are all at Solr 4 defaults. Index size is currently ~50M > > maxDocs, ~35M numDocs, 276GB. > > > > *Issue:* > > > > The background purge operation is deleting docs on schedule, but the disk > > space is not being recovered. > > > > *Presumptions:* > > I presume, but have not confirmed (how?) the 15M deleted documents are > > predominately in the two large segments. Because they are largely in the > > two large segments, and those large segments still have (some/many) live > > documents, the segment backing files are not deleted. > > > > *Questions:* > > > > - When will those segments get merged and documents recovered? Does it > > happen when _all_ the documents in those segments are deleted? Some > > percentage of the segment is filled with deleted documents? > > - Is there a way to do it right now vs. just waiting? > > - In some cases, the purge delete conditional is _just_ free disk space: > > when index > free space, delete oldest. Those setups are now in > scenarios > > where index >> free space, and getting worse. How does low disk space > > effect above two questions? > > - Is there a way for me to determine stats on a per-segment basis? > > - for example, how many deleted documents in a particular segment? > > - On the flip side, can I determine in what segment a particular document > > is located? > > > > Thank you, > > > > Scott > > > > -- > > Scott Lundgren > > Director of Engineering > > Carbon Black, Inc. > > (210) 204-0483 | scott.lundg...@carbonblack.com > > >