Dears,
Hi,
I know that there are lots of tips about how to make the Solr indexing
faster. Probably some of the most important ones which are considered in
client side are choosing batch indexing and multi-thread indexing. There
are other important factors that are server side which I dont want to
mentioned here. Anyway my question would be is there any best practice for
number of client threads and the size of batch available over WAN network?
Since the client and servers are connected over WAN network probably some
of the performance conditions such as network latency, bandwidth and etc.
are different from LAN network. Another think that is matter for me is the
fact that document sizes are might be different in diverse scenarios. For
example when you want to index web-pages the size of document might be from
1KB to 200KB. In such case choosing batch size according to the number of
documents is probably not the best way of optimizing index performance.
Probably choosing based on the size of batch size in KB/MB would be better
from the network point of view. However, from the Solr side document
numbers matter.
So if I want to summarize my questions here what am I looking for:
1- Is there any best practice available for Solr client side performance
tuning over WAN network for the purpose of indexing/reindexing/updating?
Does it different from LAN network?
2- Which one is matter: number of documents or the total size of documents
in batch?

Best regards.

-- 
A.Nazemian

Reply via email to