Looks like the disk check here is the problem, I am no Java developer, but this 
patch ignores the check if you are using the link method for splitting. 
Attached the patch. This is off of the commit for 7.7.2, d4c30fc285 . The 
modified version only has to be run on the overseer machine, so there is that 
at least.

________________________________
From: Andrew Kettmann
Sent: Tuesday, June 18, 2019 11:32:43 AM
To: solr-user@lucene.apache.org
Subject: Solr 7.7.2 - SolrCloud - SPLITSHARD - Using LINK method fails on disk 
usage checks


Using Solr 7.7.2 Docker image, testing some of the new autoscale features, huge 
fan so far. Tested with the link method on a 2GB core and found that it took 
less than 1MB of additional space. Filled the core quite a bit larger, 12GB of 
a 20GB PVC, and now splitting the shard fails with the following error message 
on my overseer:


2019-06-18 16:27:41.754 ERROR 
(OverseerThreadFactory-49-thread-5-processing-n:10.0.192.74:8983_solr) 
[c:test_autoscale s:shard1  ] o.a.s.c.a.c.OverseerCollectionMessageHandler 
Collection: test_autoscale operation: splitshard 
failed:org.apache.solr.common.SolrException: not enough free disk space to 
perform index split on node 10.0.193.23:8983_solr, required: 23.35038321465254, 
available: 7.811378479003906
    at 
org.apache.solr.cloud.api.collections.SplitShardCmd.checkDiskSpace(SplitShardCmd.java:567)
    at 
org.apache.solr.cloud.api.collections.SplitShardCmd.split(SplitShardCmd.java:138)
    at 
org.apache.solr.cloud.api.collections.SplitShardCmd.call(SplitShardCmd.java:94)
    at 
org.apache.solr.cloud.api.collections.OverseerCollectionMessageHandler.processMessage(OverseerCollectionMessageHandler.java:294)
    at 
org.apache.solr.cloud.OverseerTaskProcessor$Runner.run(OverseerTaskProcessor.java:505)
    at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209)
    at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:834)



I attempted sending the request to the node itself to see if it did anything 
different, but no luck. My parameters are (Note Python formatting as that is my 
language of choice):



splitparams = {'action':'SPLITSHARD',
               'collection':'test_autoscale',
               'shard':'shard1',
               'splitMethod':'link',
               'timing':'true',
               'async':'shardsplitasync'}


And this is confirmed by the log message from the node itself:


2019-06-18 16:27:41.730 INFO  (qtp1107530534-16) [c:test_autoscale   ] 
o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/collections 
params={async=shardsplitasync&timing=true&action=SPLITSHARD&collection=test_autoscale&shard=shard1&splitMethod=link}
 status=0 QTime=20


While it is true I do not have enough space if I were using the rewrite method, 
the link method on a 2GB core used an additional less than 1MB of space. Is 
there something I am missing here? is there an option to disable the disk space 
check that I need to pass? I can't find anything in the documentation at this 
point.


[https://storage.googleapis.com/e24-email-images/e24logonotag.png]<https://www.evolve24.com>
Andrew Kettmann
DevOps Engineer
P: 1.314.596.2836
[LinkedIn]<https://linkedin.com/company/evolve24> [Twitter] 
<https://twitter.com/evolve24>  [Instagram] 
<https://www.instagram.com/evolve_24>

evolve24 Confidential & Proprietary Statement: This email and any attachments 
are confidential and may contain information that is privileged, confidential 
or exempt from disclosure under applicable law. It is intended for the use of 
the recipients. If you are not the intended recipient, or believe that you have 
received this communication in error, please do not read, print, copy, 
retransmit, disseminate, or otherwise use the information. Please delete this 
email and attachments, without reading, printing, copying, forwarding or saving 
them, and notify the Sender immediately by reply email. No confidentiality or 
privilege is waived or lost by any transmission in error.
diff --git a/solr/core/src/java/org/apache/solr/cloud/api/collections/SplitShardCmd.java b/solr/core/src/java/org/apache/solr/cloud/api/collections/SplitShardCmd.java
index 24a52eaf97..e018f8a42f 100644
--- a/solr/core/src/java/org/apache/solr/cloud/api/collections/SplitShardCmd.java
+++ b/solr/core/src/java/org/apache/solr/cloud/api/collections/SplitShardCmd.java
@@ -135,7 +135,9 @@ public class SplitShardCmd implements OverseerCollectionMessageHandler.Cmd {
     }
 
     RTimerTree t = timings.sub("checkDiskSpace");
-    checkDiskSpace(collectionName, slice.get(), parentShardLeader);
+    if (splitMethod != SolrIndexSplitter.SplitMethod.LINK) {
+      checkDiskSpace(collectionName, slice.get(), parentShardLeader);
+    }
     t.stop();
 
     // let's record the ephemeralOwner of the parent leader node

Reply via email to