On 11/30/22 00:33, Nagarajan Muthupandian wrote:
POC would be to add a function in the plugin.. which would query all the 
documents locally (Say 100+ Million Documents) and update 1 or 2 fields with a 
particular value.

As the plugin would be local to this core.. wanted to avoid HTTP calls.

HTTP is likely not a bottleneck for doing that many updates, especially if all the traffic is on a LAN with gigabit or higher speed.

There are at least threelikely bottlenecks I can think of.  The first is definitely happening ... Lucene has no way to update a column like you would in a database, so the entire document must be reindexed in most situations.  The second bottleneck applies if there are any periodic commits happening before the whole process is finished, opening new searchers.  That's a commit from ANY source, not just the program doing the work of updating every document.  Solr and Lucene do not have any concept of transactions, so any commit will incorporate all changes made since the last searcher was opened, from any source.  The third bottleneck applies if you have extremely frequent commits either sent manually or with features like autoSoftCommit or commitWithin.

All of these bottlenecks will also happen with the embedded server, cutting http out of the picture will not help.

Thanks,
Shawn

Reply via email to