Hi Shawn Thanks for the reply.
It was a single delete with a date range query. We have 8 machines each with 35GB memory, 10GB is allocated to the JVM. Garbage collection has always been a problem for us with the heap not clearing on Full garbage collection. I don't know what is being held in memory and refuses to be collected. I have seen your java heap configuration on previous posts and it's very like ours except that we are not currently using LargePages (I don't know how much difference that has made to your memory usage). We have tried various configurations around Java including the G1 collector (which was awful) but all settings seem to leave the old generation at least 50% full, so it quickly fills up again. -Xms10240M -Xmx10240M -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSParallelRemarkEnabled -XX:NewRatio=2 -XX:+CMSScavengeBeforeRemark -XX:CMSWaitDuration=5000 -XX:+CMSClassUnloadingEnabled -XX:CMSInitiatingOccupancyFraction=80 -XX:+UseCMSInitiatingOccupancyOnly If I could only figure out what keeps the heap to the current level I feel we would be in a better place with solr. Thanks. On 1 May 2013 14:40, Shawn Heisey <s...@elyograg.org> wrote: > On 5/1/2013 3:39 AM, Annette Newton wrote: > > We have a 4 shard - 2 replica solr cloud setup, each with about 26GB of > > index. A total of 24,000,000. We issued a rather large delete yesterday > > morning to reduce that size by about half, this resulted in the loss of > all > > shards while the delete was taking place, but when it had apparently > > finished as soon as we started writing again we continued to lose shards. > > > > We have also issued much smaller deletes and lost shards but before they > > have always come back ok. This time we couldn't keep them online. We > > ended up rebuilding out cloud setup and switching over to it. > > > > Is there a better process for deleting documents? Is this expected > > behaviour? > > How was the delete composed? Was it a single request with a simple > query, or was a it a huge list of IDs or a huge query? Was it millions > of individual delete queries? All of those should be fine, but the last > option is the hardest on Solr, especially if you are doing a lot of > commits at the same time. You might need to increase the zkTimeout > value on your startup commandline or in solr.xml. > > How many machines do your eight SolrCloud replicas live on? How much RAM > to they have? How much of that memory is allocated to the Java heap? > > Assuming that your SolrCloud is living on eight separate machines that > each have a 26GB index, I hope that you have 16 to 32 GB of RAM on each > of those machines, and that a large chunk of that RAM is not allocated > to Java or any other program. If you don't, then it will be very > difficult to get good performance out of Solr, especially for index > commits. If you have multiple 26GB shards per machine, you'll need even > more free memory. The free memory is used to cache your index files. > > Another possible problem here is Java garbage collection pauses. If you > have a large max heap and don't have a tuned GC configuration, then the > only way to fix this is to reduce your heap and/or to tune Java's > garbage collection. > > Thanks, > Shawn > > -- Annette Newton Database Administrator ServiceTick Ltd T:+44(0)1603 618326 Seebohm House, 2-4 Queen Street, Norwich, England NR2 4SQ www.servicetick.com *www.sessioncam.com* -- *This message is confidential and is intended to be read solely by the addressee. The contents should not be disclosed to any other person or copies taken unless authorised to do so. If you are not the intended recipient, please notify the sender and permanently delete this message. As Internet communications are not secure ServiceTick accepts neither legal responsibility for the contents of this message nor responsibility for any change made to this message after it was forwarded by the original author.*