Re: [PR] Fix Solr 10 migration [stormcrawler]

via GitHub Wed, 25 Mar 2026 10:10:21 -0700


mvolikas commented on code in PR #1833:
URL: https://github.com/apache/stormcrawler/pull/1833#discussion_r2989694164



##########
external/solr/src/main/java/org/apache/stormcrawler/solr/SolrConnection.java:
##########
@@ -194,25 +197,42 @@ private void flushUpdates(
             }
         }
 
+        if (docs.isEmpty() && deletionIds.isEmpty()) {
+            return;
+        }
+
         UpdateRequest updateRequest = new UpdateRequest();
-        updateRequest.add(docs);
-        updateRequest.deleteById(deletionIds);
+        if (!docs.isEmpty()) {
+            updateRequest.add(docs);
+        }
+        if (!deletionIds.isEmpty()) {
+            updateRequest.deleteById(deletionIds);
+        }
 
         List<Update> batch = new ArrayList<>(waitingUpdates);
         waitingUpdates.clear();
 
-        // Get the async client
-        LBHttp2SolrClient lbHttp2SolrClient = 
cloudHttp2SolrClient.getLbClient();
+        // Building the endpoint for the current leader
+        LBSolrClient.Endpoint endpoint =
+                new LBSolrClient.Endpoint(leader.getBaseUrl(), 
leader.getCoreName());
+        List<LBSolrClient.Endpoint> endpoints = 
Collections.singletonList(endpoint);
         LBSolrClient.Req req = new LBSolrClient.Req(updateRequest, endpoints);
 
-        lbHttp2SolrClient
+        /*
+         * Retrieve the async LB client from the CloudSolrClient.
+         * NOTE: This relies on the current internal implementation of 
CloudSolrClient.Builder,
+         * which returns a CloudHttp2SolrClient.getLbClient() is protected 
there and is
+         * not part of the public SolrJ 10 API surface, so this may require 
adjustments in future Solr versions.

Review Comment:
   From what I see, `CloudHttp2SolrClient.getLbClient()` is part of the public 
API. Maybe we just quote and link what is mentioned under [Major Changes in 
Solr 
10#SolrJ](https://solr.apache.org/guide/solr/latest/upgrade-notes/major-changes-in-solr-10.html#solrj)
 so that it is explicit why we have references to `CloudHttp2SolrClient`?



##########
external/solr/src/main/java/org/apache/stormcrawler/solr/SolrConnection.java:
##########
@@ -194,25 +197,42 @@ private void flushUpdates(
             }
         }
 
+        if (docs.isEmpty() && deletionIds.isEmpty()) {
+            return;
+        }
+
         UpdateRequest updateRequest = new UpdateRequest();
-        updateRequest.add(docs);
-        updateRequest.deleteById(deletionIds);
+        if (!docs.isEmpty()) {
+            updateRequest.add(docs);
+        }
+        if (!deletionIds.isEmpty()) {
+            updateRequest.deleteById(deletionIds);
+        }
 
         List<Update> batch = new ArrayList<>(waitingUpdates);
         waitingUpdates.clear();
 
-        // Get the async client
-        LBHttp2SolrClient lbHttp2SolrClient = 
cloudHttp2SolrClient.getLbClient();
+        // Building the endpoint for the current leader
+        LBSolrClient.Endpoint endpoint =
+                new LBSolrClient.Endpoint(leader.getBaseUrl(), 
leader.getCoreName());
+        List<LBSolrClient.Endpoint> endpoints = 
Collections.singletonList(endpoint);
         LBSolrClient.Req req = new LBSolrClient.Req(updateRequest, endpoints);
 
-        lbHttp2SolrClient
+        /*
+         * Retrieve the async LB client from the CloudSolrClient.
+         * NOTE: This relies on the current internal implementation of 
CloudSolrClient.Builder,
+         * which returns a CloudHttp2SolrClient.getLbClient() is protected 
there and is
+         * not part of the public SolrJ 10 API surface, so this may require 
adjustments in future Solr versions.
+         */
+        LBAsyncSolrClient lbAsyncSolrClient =
+                (LBAsyncSolrClient) ((CloudHttp2SolrClient) 
cloudClient).getLbClient();
+
+        lbAsyncSolrClient
                 .requestAsync(req)
                 .whenComplete(
-                        (futureResponse, throwable) -> {
+                        (response, throwable) -> {
                             if (throwable != null) {
                                 LOG.error("Exception caught while updating", 
throwable);
-
-                                // The request failed => add the batch back to 
the pending updates

Review Comment:
   Can we keep the comment here? 



##########
external/solr/src/main/java/org/apache/stormcrawler/solr/SolrConnection.java:
##########
@@ -250,44 +270,27 @@ public void deleteByIdAsync(String id) {
 
     public CompletableFuture<QueryResponse> requestAsync(QueryRequest request) 
{
         if (cloud) {
-            CloudHttp2SolrClient cloudHttp2SolrClient = (CloudHttp2SolrClient) 
client;
-
-            // Find the shard to route the request to
-            String shardId = request.getParams().get("shards");
-            if (shardId == null) {
-                shardId = "shard1";
-            }
-
-            Slice slice =
-                    cloudHttp2SolrClient
-                            .getClusterState()
-                            .getCollection(collection)
-                            .getSlice(shardId);
-
-            // Will get results from the first successful replica of this shard
-            List<LBSolrClient.Endpoint> endpoints = new ArrayList<>();
-
-            for (Replica replica : slice.getReplicas()) {
-                if (replica.getState() == Replica.State.ACTIVE) {
-                    endpoints.add(
-                            new LBSolrClient.Endpoint(replica.getBaseUrl(), 
replica.getCoreName()));
-                }
-            }
-
-            // Shuffle the endpoints for basic load balancing
-            Collections.shuffle(endpoints);
-
-            // Get the async client
-            LBHttp2SolrClient lbHttp2SolrClient = 
cloudHttp2SolrClient.getLbClient();

Review Comment:
   We can keep the old async logic here by following the same approach with 
`flushUpdates()`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Fix Solr 10 migration [stormcrawler]

Reply via email to