First, I usually prefer to construct your CloudSolrClient by using the Zookeeper ensemble string rather than URLs, although that's probably not a cure for your problem.
Here's what I _think_ is happening. If you're slamming Solr with a lot of updates, you're doing a lot of merging. At some point when there are a lot of merges going on incoming updates block until one or more merge threads is done. At that point, I suspect your client is timing out. And (perhaps) if you used the Zookeeper ensemble instead of HTTP, the cluster state fetch would go away. I suspect that another issue would come up, but.... It's also possible this would all go away if you increase your timeouts significantly. That's still a "set it and hope" approach rather than a totally robust solution though. Let's assume that the above works and you start getting timeouts. You can back off the indexing rate at that point, or just go to sleep for a while. This isn't what you'd like for a permanent solution, but may let you get by. There's work afoot to separate out update thread pools from query thread pools so _querying_ doesn't suffer when indexing is heavy, but that hasn't been implemented yet. This could also address your cluster state fetch error. You will get significantly better throughput if you batch your docs and use the client.add(list_of_documents) BTW. Another possibility is to use the new metrics (since Solr 6.4). They provide over 200 metrics you can query, and it's quite possible that they'd help your clients know when to self-throttle but AFAIK, there's nothing built in to help you there. Best, Erick On Wed, Jul 4, 2018 at 2:32 AM, Arturas Mazeika <maze...@gmail.com> wrote: > Hi Solr Folk, > > I am trying to push solr to the limit and sometimes I succeed. The > questions is how to not go over it, e.g., avoid: > > java.lang.RuntimeException: Tried fetching cluster state using the node > names we knew of, i.e. [192.168.56.1:9998_solr, 192.168.56.1:9997_solr, > 192.168.56.1:9999_solr, 192.168.56.1:9996_solr]. However, succeeded in > obtaining the cluster state from none of them.If you think your Solr > cluster is up and is accessible, you could try re-creating a new > CloudSolrClient using working solrUrl(s) or zkHost(s). > at org.apache.solr.client.solrj.impl.HttpClusterStateProvider. > getState(HttpClusterStateProvider.java:109) > at org.apache.solr.client.solrj.impl.CloudSolrClient.resolveAliases( > CloudSolrClient.java:1113) > at org.apache.solr.client.solrj.impl.CloudSolrClient. > requestWithRetryOnStaleState(CloudSolrClient.java:845) > at org.apache.solr.client.solrj.impl.CloudSolrClient.request( > CloudSolrClient.java:818) > at org.apache.solr.client.solrj.SolrRequest.process( > SolrRequest.java:194) > at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:173) > at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:138) > at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:152) > at com.asc.InsertDEWikiSimple$SimpleThread.run( > InsertDEWikiSimple.java:132) > > > Details: > > I am benchmarking solrcloud setup on a single machine (Intel 7 with 8 "cpu > cores", an SSD as well as a HDD) using the German Wikipedia collection. I > created 4 nodes, 4 shards, rep factor: 2 cluster on the same machine (and > managed to push the CPU or SSD to the hardware limits, i.e., ~200MB/s, > ~100% CPU). Now I wanted to see what happens if I push HDD to the limits. > Indexing the files from the SSD (I am able to scan the collection at the > actual rate 400-500MB/s) with 16 threads, I tried to send those to the solr > cluster with all indexes on the HDD. > > Clearly solr needs to deal with a very slow hard drive (10-20MB/s actual > rate). If the cluster is not touched, solrj may start loosing connections > after a few hours. If one checks the status of the cluster, it may happen > sooner. After the connection is lost, the cluster calms down with writing > after a half a dozen of minutes. > > What would be a reasonable way to push to the limit without going over? > > The exact parameters are: > > - 4 cores running 2gb ram > - Schema: > > <fieldType name="ft_wiki_de" class="solr.TextField" > positionIncrementGap="100"> > <analyzer> > <charFilter class="solr.HTMLStripCharFilterFactory"/> > <tokenizer class="solr.StandardTokenizerFactory"/> > <filter class="solr.GermanMinimalStemFilterFactory"/> > <filter class="solr.LowerCaseFilterFactory"/> > </analyzer> > </fieldType> > > <fieldType name="ft_url" class="solr.TextField" positionIncrementGap="100"> > <analyzer> > <tokenizer class="solr.StandardTokenizerFactory"/> > <filter class="solr.LowerCaseFilterFactory"/> > </analyzer> > </fieldType> > > <fieldType name="uuid" class="solr.UUIDField" indexed="true" /> > <field name="id" type="uuid" indexed="true" stored="true" required="true"/> > <field name="_root_" type="uuid" indexed="true" stored="false" > docValues="false" /> > > <field name="size" type="pint" indexed="true" stored="true"/> > <field name="time" type="pdate" indexed="true" stored="true"/> > <field name="content" type="ft_wiki_de" indexed="true" stored="true"/> > <field name="url" type="ft_url" indexed="true" stored="true"/> > > <field name="_version_" type="plong" indexed="false" stored="false"/> > > I SolrJ-connect once: > > ArrayList<String> urls = new ArrayList<>(); > urls.add("http://localhost:9999/solr"); > urls.add("http://localhost:9998/solr"); > urls.add("http://localhost:9997/solr"); > urls.add("http://localhost:9996/solr"); > > solrClient = new CloudSolrClient.Builder(urls) > .withConnectionTimeout(10000) > .withSocketTimeout(60000) > .build(); > solrClient.setDefaultCollection("de_wiki_man"); > > and then execute in 16 threads till there's anything to execute: > > Path p = getJobPath(); > String content = new String > (Files.readAllBytes(p)); > UUID id = UUID.randomUUID(); > SolrInputDocument doc = new SolrInputDocument(); > > BasicFileAttributes attr = Files.readAttributes(p, > BasicFileAttributes.class); > > doc.addField("id", id.toString()); > doc.addField("content", content); > doc.addField("time", attr.creationTime().toString()); > doc.addField("size", content.length()); > doc.addField("url", p.getFileName(). > toAbsolutePath().toString()); > solrClient.add(doc); > > > to go through all the wiki html files. > > Cheers, > Arturas