Tried to increase the memory to 24G but that wasn't enough as well. Agree that the index has now grown too much and had to monitor this and take action much earlier.
The search operations seem to run ok with 16G - mainly because the bulk of the data that we are trying to delete is not getting searched. So, now - basically in a salvage mode. Does the number of documents deleted at a time have any impact? If I 'trickle delete' - say 50K documents at a time - would that make a difference? When i delete, does solr try to bring in all the index to memory? Trying to understand what happens under the hood. Thanks Vinay On 11 April 2014 13:53, Erick Erickson <erickerick...@gmail.com> wrote: > Using 16G for a 360G index is probably pushing things. A lot. I'm > actually a bit surprised that the problem only occurs when you delete > docs.... > > The simplest thing would be to increase the JVM memory. You should be > looking at your index to see how big it is, be sure to subtract out > the *.fdt and *.fdx files, those are used for verbatim copies of the > raw data and don't really count towards the memory requirements. > > I suspect you're just not giving enough memory to your JVM and this is > just the first OOM you've hit. Look on the Solr admin page and see how > much is being reported, if it's near the limit of your 16G that's the > "smoking gun"... > > Best, > Erick > > On Fri, Apr 11, 2014 at 7:45 AM, Vinay Pothnis <poth...@gmail.com> wrote: > > Sorry - yes, I meant to say leader. > > Each JVM has 16G of memory. > > > > > > On 10 April 2014 20:54, Erick Erickson <erickerick...@gmail.com> wrote: > > > >> First, there is no "master" node, just leaders and replicas. But that's > a > >> nit. > >> > >> No real clue why you would be going out of memory. Deleting a > >> document, even by query should just mark the docs as deleted, a pretty > >> low-cost operation. > >> > >> how much memory are you giving the JVM? > >> > >> Best, > >> Erick > >> > >> On Thu, Apr 10, 2014 at 6:25 PM, Vinay Pothnis <poth...@gmail.com> > wrote: > >> > [solr version 4.3.1] > >> > > >> > Hello, > >> > > >> > I have a solr cloud (4 nodes - 2 shards) with a fairly large amount > >> > documents (~360G of index per shard). Now, a major portion of the > data is > >> > not required and I need to delete those documents. I would need to > delete > >> > around 75% of the data. > >> > > >> > One of the solutions could be to drop the index completely re-index. > But > >> > this is not an option at the moment. > >> > > >> > When we tried to delete the data through a query - say 1 day/month's > >> worth > >> > of data. But after deleting just 1 month's worth of data, the master > node > >> > is going out of memory - heap space. > >> > > >> > Wondering is there any way to incrementally delete the data without > >> > affecting the cluster adversely. > >> > > >> > Thank! > >> > Vinay > >> >