On 1/11/2018 11:50 AM, Shashank Pedamallu wrote: > Thank you for the reply Kevin. I was using 6 vms from our private cloud. 5 > among them, I was using as clients to ingest data on 5 independent cores. One > vm is hosting the Solr which is where all ingest requests are received for > all cores. Since they are all on same network, I think they should not be > limited by the network bandwidth for the amount of requests I’m sending.
How large are the documents that you are indexing? If they are 1K (which would be a pretty small document), then 90K of them per second is about 88 megabytes per second of raw data, which is near the practical upper-end bandwidth limit of a gigabit ethernet connection. The theoretical maximum for gigabit ethernet is 125 megabytes per second, but protocol overhead (at the ethernet, IP, and TCP layers) typically limits the real-world achievable throughput of TCP-based communication over gigabit to something lower, perhaps 100 megabytes per second. Additional overhead from the HTTP layer and the request format (javabin, xml, json, csv, etc) would reduce it a little bit more. If the documents are bigger than 1K, then it would require even more network bandwidth. If the VMs are on different physical hosts, then you are likely to need an actual network connection for traffic between them. Having them all on the same physical host might actually increase the amount of available network bandwidth, because the traffic might never need to leave the machine and travel over a real physical network. But if they are all on the same physical host, then there would be less disk I/O bandwidth available. Running all VMs on the same physical host is not recommended for production, because it means that the entire installation goes down if the physical host dies. Thanks, Shawn