Re: Ingestion not scaling horizontally as I add more cores to Solr

Shawn Heisey Thu, 11 Jan 2018 15:06:07 -0800

On 1/11/2018 11:50 AM, Shashank Pedamallu wrote:
> Thank you for the reply Kevin. I was using 6 vms from our private cloud. 5 
> among them, I was using as clients to ingest data on 5 independent cores. One 
> vm is hosting the Solr which is where all ingest requests are received for 
> all cores. Since they are all on same network, I think they should not be 
> limited by the network bandwidth for the amount of requests I’m sending.


How large are the documents that you are indexing?  If they are 1K
(which would be a pretty small document), then 90K of them per second is
about 88 megabytes per second of raw data, which is near the practical
upper-end bandwidth limit of a gigabit ethernet connection.  The
theoretical maximum for gigabit ethernet is 125 megabytes per second,
but protocol overhead (at the ethernet, IP, and TCP layers) typically
limits the real-world achievable throughput of TCP-based communication
over gigabit to something lower, perhaps 100 megabytes per second. 
Additional overhead from the HTTP layer and the request format (javabin,
xml, json, csv, etc) would reduce it a little bit more.  If the
documents are bigger than 1K, then it would require even more network
bandwidth.

If the VMs are on different physical hosts, then you are likely to need
an actual network connection for traffic between them.  Having them all
on the same physical host might actually increase the amount of
available network bandwidth, because the traffic might never need to
leave the machine and travel over a real physical network.  But if they
are all on the same physical host, then there would be less disk I/O
bandwidth available.  Running all VMs on the same physical host is not
recommended for production, because it means that the entire
installation goes down if the physical host dies.

Thanks,
Shawn

Re: Ingestion not scaling horizontally as I add more cores to Solr

Reply via email to