Thanks Amrith. Created a bug https://issues.apache.org/jira/browse/SOLR-13481
Regards, Rajeswari On 5/19/19, 3:44 PM, "Amrit Sarkar" <sarkaramr...@gmail.com> wrote: Sounds legit to me. Can you create a Jira and list down the problem statement and design solution there. I am confident it will attract committers' attention and they can review the design and provide feedback. Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 Medium: https://medium.com/@sarkaramrit2 On Mon, May 20, 2019 at 3:59 AM Natarajan, Rajeswari < rajeswari.natara...@sap.com> wrote: > Thanks Amrith for creating a patch. But the code in the > LBHttpSolrClient.java needs to be fixed too, if the for loop to work as > intended. > Regards > Rajeswari > > public Rsp request(Req req) throws SolrServerException, IOException { > Rsp rsp = new Rsp(); > Exception ex = null; > boolean isNonRetryable = req.request instanceof IsUpdateRequest || > ADMIN_PATHS.contains(req.request.getPath()); > List<ServerWrapper> skipped = null; > > final Integer numServersToTry = req.getNumServersToTry(); > int numServersTried = 0; > > boolean timeAllowedExceeded = false; > long timeAllowedNano = getTimeAllowedInNanos(req.getRequest()); > long timeOutTime = System.nanoTime() + timeAllowedNano; > for (String serverStr : req.getServers()) { > if (timeAllowedExceeded = isTimeExceeded(timeAllowedNano, > timeOutTime)) { > break; > } > > serverStr = normalize(serverStr); > // if the server is currently a zombie, just skip to the next one > ServerWrapper wrapper = zombieServers.get(serverStr); > if (wrapper != null) { > // System.out.println("ZOMBIE SERVER QUERIED: " + serverStr); > final int numDeadServersToTry = req.getNumDeadServersToTry(); > if (numDeadServersToTry > 0) { > if (skipped == null) { > skipped = new ArrayList<>(numDeadServersToTry); > skipped.add(wrapper); > } > else if (skipped.size() < numDeadServersToTry) { > skipped.add(wrapper); > } > } > continue; > } > try { > MDC.put("LBHttpSolrClient.url", serverStr); > > if (numServersToTry != null && numServersTried > > numServersToTry.intValue()) { > break; > } > > HttpSolrClient client = makeSolrClient(serverStr); > > ++numServersTried; > ex = doRequest(client, req, rsp, isNonRetryable, false, null); > if (ex == null) { > return rsp; // SUCCESS > } > } finally { > MDC.remove("LBHttpSolrClient.url"); > } > } > > // try the servers we previously skipped > if (skipped != null) { > for (ServerWrapper wrapper : skipped) { > if (timeAllowedExceeded = isTimeExceeded(timeAllowedNano, > timeOutTime)) { > break; > } > > if (numServersToTry != null && numServersTried > > numServersToTry.intValue()) { > break; > } > > try { > MDC.put("LBHttpSolrClient.url", wrapper.client.getBaseURL()); > ++numServersTried; > ex = doRequest(wrapper.client, req, rsp, isNonRetryable, true, > wrapper.getKey()); > if (ex == null) { > return rsp; // SUCCESS > } > } finally { > MDC.remove("LBHttpSolrClient.url"); > } > } > } > > > final String solrServerExceptionMessage; > if (timeAllowedExceeded) { > solrServerExceptionMessage = "Time allowed to handle this request > exceeded"; > } else { > if (numServersToTry != null && numServersTried > > numServersToTry.intValue()) { > solrServerExceptionMessage = "No live SolrServers available to > handle this request:" > + " numServersTried="+numServersTried > + " numServersToTry="+numServersToTry.intValue(); > } else { > solrServerExceptionMessage = "No live SolrServers available to > handle this request"; > } > } > if (ex == null) { > throw new SolrServerException(solrServerExceptionMessage); > } else { > throw new SolrServerException(solrServerExceptionMessage+":" + > zombieServers.keySet(), ex); > } > > } > > On 5/19/19, 3:12 PM, "Amrit Sarkar" <sarkaramr...@gmail.com> wrote: > > > > > Thanks Natrajan, > > > > Solid analysis and I saw the issue being reported by multiple users > in > > past few months and unfortunately I baked an incomplete code. > > > > I think the correct way of solving this issue is to identify the > correct > > base-url for the respective core we need to trigger REQUESTRECOVERY > to and > > create a local HttpSolrClient instead of using CloudSolrClient from > > CdcrReplicatorState. This will avoid unnecessary retry which will be > > redundant in our case. > > > > I baked a small patch few weeks back and will upload it on the > SOLR-11724 > > <https://issues.apache.org/jira/browse/SOLR-11724>. > > > > >