Toke Eskildsen <t...@statsbiblioteket.dk> wrote
> Use more than one cloud. Make them fully independent.
> As I suggested when you asked 4 days ago. That would
> also make it easy to scale: Just measure how much a
> single setup can take and do the math.

The goal is 250K documents/second.

I tried modifying the books.csv-example that comes with Solr to use lines with 
400 characters and inflated it to 4 * 1 million entries. I then started a Solr 
with the techproduct-example and ingested the 4*1M entries using curl from 4 
prompts a the same time. The longest running took 138 seconds. 4M/138 seconds = 
29K documents/second.

My machine is a 4 core (8 with HyperThreading) i7 laptop, using SSD. On a 
modern server and with custom schema & config, the speed should of course be 
better. On the other hand, the rate might slow down as the shards grows.

Give or take, something like 10 machines could conceivably be enough to handle 
the Solr load if the analysis chain is near the books-example in complexity. Of 
course real data tests are needed and the CSv-data must be constructed somehow.

- Toke Eskildsen

Reply via email to