Hi Solr Folk, I am trying to push solr to the limit and sometimes I succeed. The questions is how to not go over it, e.g., avoid:
java.lang.RuntimeException: Tried fetching cluster state using the node names we knew of, i.e. [192.168.56.1:9998_solr, 192.168.56.1:9997_solr, 192.168.56.1:9999_solr, 192.168.56.1:9996_solr]. However, succeeded in obtaining the cluster state from none of them.If you think your Solr cluster is up and is accessible, you could try re-creating a new CloudSolrClient using working solrUrl(s) or zkHost(s). at org.apache.solr.client.solrj.impl.HttpClusterStateProvider. getState(HttpClusterStateProvider.java:109) at org.apache.solr.client.solrj.impl.CloudSolrClient.resolveAliases( CloudSolrClient.java:1113) at org.apache.solr.client.solrj.impl.CloudSolrClient. requestWithRetryOnStaleState(CloudSolrClient.java:845) at org.apache.solr.client.solrj.impl.CloudSolrClient.request( CloudSolrClient.java:818) at org.apache.solr.client.solrj.SolrRequest.process( SolrRequest.java:194) at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:173) at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:138) at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:152) at com.asc.InsertDEWikiSimple$SimpleThread.run( InsertDEWikiSimple.java:132) Details: I am benchmarking solrcloud setup on a single machine (Intel 7 with 8 "cpu cores", an SSD as well as a HDD) using the German Wikipedia collection. I created 4 nodes, 4 shards, rep factor: 2 cluster on the same machine (and managed to push the CPU or SSD to the hardware limits, i.e., ~200MB/s, ~100% CPU). Now I wanted to see what happens if I push HDD to the limits. Indexing the files from the SSD (I am able to scan the collection at the actual rate 400-500MB/s) with 16 threads, I tried to send those to the solr cluster with all indexes on the HDD. Clearly solr needs to deal with a very slow hard drive (10-20MB/s actual rate). If the cluster is not touched, solrj may start loosing connections after a few hours. If one checks the status of the cluster, it may happen sooner. After the connection is lost, the cluster calms down with writing after a half a dozen of minutes. What would be a reasonable way to push to the limit without going over? The exact parameters are: - 4 cores running 2gb ram - Schema: <fieldType name="ft_wiki_de" class="solr.TextField" positionIncrementGap="100"> <analyzer> <charFilter class="solr.HTMLStripCharFilterFactory"/> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.GermanMinimalStemFilterFactory"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldType> <fieldType name="ft_url" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldType> <fieldType name="uuid" class="solr.UUIDField" indexed="true" /> <field name="id" type="uuid" indexed="true" stored="true" required="true"/> <field name="_root_" type="uuid" indexed="true" stored="false" docValues="false" /> <field name="size" type="pint" indexed="true" stored="true"/> <field name="time" type="pdate" indexed="true" stored="true"/> <field name="content" type="ft_wiki_de" indexed="true" stored="true"/> <field name="url" type="ft_url" indexed="true" stored="true"/> <field name="_version_" type="plong" indexed="false" stored="false"/> I SolrJ-connect once: ArrayList<String> urls = new ArrayList<>(); urls.add("http://localhost:9999/solr"); urls.add("http://localhost:9998/solr"); urls.add("http://localhost:9997/solr"); urls.add("http://localhost:9996/solr"); solrClient = new CloudSolrClient.Builder(urls) .withConnectionTimeout(10000) .withSocketTimeout(60000) .build(); solrClient.setDefaultCollection("de_wiki_man"); and then execute in 16 threads till there's anything to execute: Path p = getJobPath(); String content = new String (Files.readAllBytes(p)); UUID id = UUID.randomUUID(); SolrInputDocument doc = new SolrInputDocument(); BasicFileAttributes attr = Files.readAttributes(p, BasicFileAttributes.class); doc.addField("id", id.toString()); doc.addField("content", content); doc.addField("time", attr.creationTime().toString()); doc.addField("size", content.length()); doc.addField("url", p.getFileName(). toAbsolutePath().toString()); solrClient.add(doc); to go through all the wiki html files. Cheers, Arturas