On 11/30/22 00:33, Nagarajan Muthupandian wrote:
POC would be to add a function in the plugin.. which would query all the
documents locally (Say 100+ Million Documents) and update 1 or 2 fields with a
particular value.
As the plugin would be local to this core.. wanted to avoid HTTP calls.
HTTP is likely not a bottleneck for doing that many updates, especially
if all the traffic is on a LAN with gigabit or higher speed.
There are at least threelikely bottlenecks I can think of. The first is
definitely happening ... Lucene has no way to update a column like you
would in a database, so the entire document must be reindexed in most
situations. The second bottleneck applies if there are any periodic
commits happening before the whole process is finished, opening new
searchers. That's a commit from ANY source, not just the program doing
the work of updating every document. Solr and Lucene do not have any
concept of transactions, so any commit will incorporate all changes made
since the last searcher was opened, from any source. The third
bottleneck applies if you have extremely frequent commits either sent
manually or with features like autoSoftCommit or commitWithin.
All of these bottlenecks will also happen with the embedded server,
cutting http out of the picture will not help.
Thanks,
Shawn