Thanks John,
I have 2 shards, 1 replica in each.
The issue is the external processing job(s) I have to convert external
data into JSON, and then upload it via cURL.
Will one Solr server only accept one update at a time and have any
others queued? (And possibly timeout).
I like the idea of having my leaders only deal with indexing, and the
replicas only deal with searching - how can I actually configure this?
And is it actually required with my shard setup?
I'm doing hard commits every minute but not opening a new searcher (so I
know the data is safe), with soft commits happening every 10 minutes to
make the data visible.
Cheers,
Rob
On 04/04/16 22:40, John Bickerstaff wrote:
Will the processes be Solr processes? Or do you mean multiple threads
hitting the same Solr server(s)?
There will be a natural bottleneck at one Solr server if you are hitting it
with a lot of threads - since that one server will have to do all the
indexing.
I don't know if this idea is helpful, but if your underlying challenge is
protecting the user experience and preventing slowdown during the indexing,
you can have a separate Solr server that just accepts incoming documents
(and bearing the cost of the indexing) while serving documents from other
Solr servers...
There will be a slight cost for those "serving servers" to get updates from
the "indexing server" but that will be much less than the cost of indexing
directly.
If processing power was really important you could have two or more
"indexing" servers and fire multiple threads at each one...
You probably already know this, but the key is how often you "commit" and
force the indexing to occur...
On Mon, Apr 4, 2016 at 3:33 PM, Robert Brown <r...@intelcompute.com> wrote:
Hi,
Does Solr have any sort of limit when attempting multiple updates, from
separate clients?
Are there any safe thresholds one should try to stay within?
I have an index of around 60m documents that gets updated at key points
during the day from ~200 downloaded files - I'd like to fork off multiple
processes to deal with the incoming data to get it into Solr quicker.
Thanks,
Rob