Thank you Erick! Yes - I am using the expunge deletes option. Thanks for the note on disk space for the optimize command. I should have enough space for that. What about the heap space requirement? I hope it can do the optimize with the memory that is allocated to it.
Thanks Vinay On 16 April 2014 04:52, Erick Erickson <erickerick...@gmail.com> wrote: > The optimize should, indeed, reduce the index size. Be aware that it > may consume 2x the disk space. You may also try expungedeletes, see > here: https://wiki.apache.org/solr/UpdateXmlMessages > > Best, > Erick > > On Wed, Apr 16, 2014 at 12:47 AM, Vinay Pothnis <poth...@gmail.com> wrote: > > Another update: > > > > I removed the replicas - to avoid the replication doing a full copy. I am > > able delete sizeable chunks of data. > > But the overall index size remains the same even after the deletes. It > does > > not seem to go down. > > > > I understand that Solr would do this in background - but I don't seem to > > see the decrease in overall index size even after 1-2 hours. > > I can see a bunch of ".del" files in the index directory, but the it does > > not seem to get cleaned up. Is there anyway to monitor/follow the > progress > > of index compaction? > > > > Also, does triggering "optimize" from the admin UI help to compact the > > index size on disk? > > > > Thanks > > Vinay > > > > > > On 14 April 2014 12:19, Vinay Pothnis <poth...@gmail.com> wrote: > > > >> Some update: > >> > >> I removed the auto warm configurations for the various caches and > reduced > >> the cache sizes. I then issued a call to delete a day's worth of data > (800K > >> documents). > >> > >> There was no out of memory this time - but some of the nodes went into > >> recovery mode. Was able to catch some logs this time around and this is > >> what i see: > >> > >> **************** > >> *WARN [2014-04-14 18:11:00.381] [org.apache.solr.update.PeerSync] > >> PeerSync: core=core1_shard1_replica2 url=http://host1:8983/solr > >> <http://host1:8983/solr> too many updates received since start - > >> startingUpdates no longer overlaps with our currentUpdates* > >> *INFO [2014-04-14 18:11:00.476] > [org.apache.solr.cloud.RecoveryStrategy] > >> PeerSync Recovery was not successful - trying replication. > >> core=core1_shard1_replica2* > >> *INFO [2014-04-14 18:11:00.476] > [org.apache.solr.cloud.RecoveryStrategy] > >> Starting Replication Recovery. core=core1_shard1_replica2* > >> *INFO [2014-04-14 18:11:00.535] > [org.apache.solr.cloud.RecoveryStrategy] > >> Begin buffering updates. core=core1_shard1_replica2* > >> *INFO [2014-04-14 18:11:00.536] > [org.apache.solr.cloud.RecoveryStrategy] > >> Attempting to replicate from > http://host2:8983/solr/core1_shard1_replica1/ > >> <http://host2:8983/solr/core1_shard1_replica1/>. > core=core1_shard1_replica2* > >> *INFO [2014-04-14 18:11:00.536] > >> [org.apache.solr.client.solrj.impl.HttpClientUtil] Creating new http > >> client, > >> > config:maxConnections=128&maxConnectionsPerHost=32&followRedirects=false* > >> *INFO [2014-04-14 18:11:01.964] > >> [org.apache.solr.client.solrj.impl.HttpClientUtil] Creating new http > >> client, > >> > config:connTimeout=5000&socketTimeout=20000&allowCompression=false&maxConnections=10000&maxConnectionsPerHost=10000* > >> *INFO [2014-04-14 18:11:01.969] [org.apache.solr.handler.SnapPuller] > No > >> value set for 'pollInterval'. Timer Task not started.* > >> *INFO [2014-04-14 18:11:01.973] [org.apache.solr.handler.SnapPuller] > >> Master's generation: 1108645* > >> *INFO [2014-04-14 18:11:01.973] [org.apache.solr.handler.SnapPuller] > >> Slave's generation: 1108627* > >> *INFO [2014-04-14 18:11:01.973] [org.apache.solr.handler.SnapPuller] > >> Starting replication process* > >> *INFO [2014-04-14 18:11:02.007] [org.apache.solr.handler.SnapPuller] > >> Number of files in latest index in master: 814* > >> *INFO [2014-04-14 18:11:02.007] > >> [org.apache.solr.core.CachingDirectoryFactory] return new directory for > >> /opt/data/solr/core1_shard1_replica2/data/index.20140414181102007* > >> *INFO [2014-04-14 18:11:02.008] [org.apache.solr.handler.SnapPuller] > >> Starting download to > >> NRTCachingDirectory(org.apache.lucene.store.MMapDirectory@ > /opt/data/solr/core1_shard1_replica2/data/index.20140414181102007 > >> lockFactory=org.apache.lucene.store.NativeFSLockFactory@5f6570fe; > >> maxCacheMB=48.0 maxMergeSizeMB=4.0) fullCopy=true* > >> > >> **************** > >> > >> > >> So, it looks like the number of updates is too huge for the regular > >> replication and then it goes into full copy of index. And since our > index > >> size is very huge (350G), this is causing the cluster to go into > recovery > >> mode forever - trying to copy that huge index. > >> > >> I also read in some thread > >> > http://lucene.472066.n3.nabble.com/Recovery-too-many-updates-received-since-start-td3935281.htmlthatthere > is a limit of 100 documents. > >> > >> I wonder if this has been updated to make that configurable since that > >> thread. If not, the only option I see is to do a "trickle" delete of 100 > >> documents per second or something. > >> > >> Also - the other suggestion of using "distributed=false" might not help > >> because the issue currently is that the replication is going to "full > copy". > >> > >> Any thoughts? > >> > >> Thanks > >> Vinay > >> > >> > >> > >> > >> > >> > >> > >> On 14 April 2014 07:54, Vinay Pothnis <poth...@gmail.com> wrote: > >> > >>> Yes, that is our approach. We did try deleting a day's worth of data > at a > >>> time, and that resulted in OOM as well. > >>> > >>> Thanks > >>> Vinay > >>> > >>> > >>> On 14 April 2014 00:27, Furkan KAMACI <furkankam...@gmail.com> wrote: > >>> > >>>> Hi; > >>>> > >>>> I mean you can divide the range (i.e. one week at each delete instead > of > >>>> one month) and try to check whether you still get an OOM or not. > >>>> > >>>> Thanks; > >>>> Furkan KAMACI > >>>> > >>>> > >>>> 2014-04-14 7:09 GMT+03:00 Vinay Pothnis <poth...@gmail.com>: > >>>> > >>>> > Aman, > >>>> > Yes - Will do! > >>>> > > >>>> > Furkan, > >>>> > How do you mean by 'bulk delete'? > >>>> > > >>>> > -Thanks > >>>> > Vinay > >>>> > > >>>> > > >>>> > On 12 April 2014 14:49, Furkan KAMACI <furkankam...@gmail.com> > wrote: > >>>> > > >>>> > > Hi; > >>>> > > > >>>> > > Do you get any problems when you index your data? On the other > hand > >>>> > > deleting as bulks and reducing the size of documents may help you > >>>> not to > >>>> > > hit OOM. > >>>> > > > >>>> > > Thanks; > >>>> > > Furkan KAMACI > >>>> > > > >>>> > > > >>>> > > 2014-04-12 8:22 GMT+03:00 Aman Tandon <amantandon...@gmail.com>: > >>>> > > > >>>> > > > Vinay please share your experience after trying this solution. > >>>> > > > > >>>> > > > > >>>> > > > On Sat, Apr 12, 2014 at 4:12 AM, Vinay Pothnis < > poth...@gmail.com> > >>>> > > wrote: > >>>> > > > > >>>> > > > > The query is something like this: > >>>> > > > > > >>>> > > > > > >>>> > > > > *curl -H 'Content-Type: text/xml' --data > >>>> '<delete><query>param1:(val1 > >>>> > > OR > >>>> > > > > val2) AND -param2:(val3 OR val4) AND > date_param:[1383955200000 TO > >>>> > > > > 1385164800000]</query></delete>' > >>>> > > > > 'http://host:port/solr/coll-name1/update?commit=true'* > >>>> > > > > > >>>> > > > > Trying to restrict the number of documents deleted via the > date > >>>> > > > parameter. > >>>> > > > > > >>>> > > > > Had not tried the "distrib=false" option. I could give that a > >>>> try. > >>>> > > Thanks > >>>> > > > > for the link! I will check on the cache sizes and autowarm > >>>> values. > >>>> > Will > >>>> > > > try > >>>> > > > > and disable the caches when I am deleting and give that a try. > >>>> > > > > > >>>> > > > > Thanks Erick and Shawn for your inputs! > >>>> > > > > > >>>> > > > > -Vinay > >>>> > > > > > >>>> > > > > > >>>> > > > > > >>>> > > > > On 11 April 2014 15:28, Shawn Heisey <s...@elyograg.org> > wrote: > >>>> > > > > > >>>> > > > > > On 4/10/2014 7:25 PM, Vinay Pothnis wrote: > >>>> > > > > > > >>>> > > > > >> When we tried to delete the data through a query - say 1 > >>>> > day/month's > >>>> > > > > worth > >>>> > > > > >> of data. But after deleting just 1 month's worth of data, > the > >>>> > master > >>>> > > > > node > >>>> > > > > >> is going out of memory - heap space. > >>>> > > > > >> > >>>> > > > > >> Wondering is there any way to incrementally delete the data > >>>> > without > >>>> > > > > >> affecting the cluster adversely. > >>>> > > > > >> > >>>> > > > > > > >>>> > > > > > I'm curious about the actual query being used here. Can you > >>>> share > >>>> > > it, > >>>> > > > or > >>>> > > > > > a redacted version of it? Perhaps there might be a clue > there? > >>>> > > > > > > >>>> > > > > > Is this a fully distributed delete request? One thing you > >>>> might > >>>> > try, > >>>> > > > > > assuming Solr even supports it, is sending the same delete > >>>> request > >>>> > > > > directly > >>>> > > > > > to each shard core with distrib=false. > >>>> > > > > > > >>>> > > > > > Here's a very incomplete list about how you can reduce Solr > >>>> heap > >>>> > > > > > requirements: > >>>> > > > > > > >>>> > > > > > http://wiki.apache.org/solr/SolrPerformanceProblems# > >>>> > > > > > Reducing_heap_requirements > >>>> > > > > > > >>>> > > > > > Thanks, > >>>> > > > > > Shawn > >>>> > > > > > > >>>> > > > > > > >>>> > > > > > >>>> > > > > >>>> > > > > >>>> > > > > >>>> > > > -- > >>>> > > > With Regards > >>>> > > > Aman Tandon > >>>> > > > > >>>> > > > >>>> > > >>>> > >>> > >>> > >> >