Arturas:
" it is becoming incredibly difficult to find working code"
Yeah, I sympathize totally. What I usually do is go into the test code
of whatever version of Solr I'm using and find examples there. _That_
code _must_ be kept up to date ;).
About batching docs. What you gain basically more e
On 7/4/2018 3:32 AM, Arturas Mazeika wrote:
Details:
I am benchmarking solrcloud setup on a single machine (Intel 7 with 8 "cpu
cores", an SSD as well as a HDD) using the German Wikipedia collection. I
created 4 nodes, 4 shards, rep factor: 2 cluster on the same machine (and
managed to push the
limits.
> > Indexing the files from the SSD (I am able to scan the collection at the
> > actual rate 400-500MB/s) with 16 threads, I tried to send those to the
> solr
> > cluster with all indexes on the HDD.
> >
> > Clearly
early solr needs to deal with a very slow hard drive (10-20MB/s actual
> rate). If the cluster is not touched, solrj may start loosing connections
> after a few hours. If one checks the status of the cluster, it may happen
> sooner. After the connection is lost, the cluster calms down with
few hours. If one checks the status of the cluster, it may happen
sooner. After the connection is lost, the cluster calms down with writing
after a half a dozen of minutes.
What would be a reasonable way to push to the limit without going over?
The exact parameters are:
- 4 cores running 2gb ra