I have a small solr setup, not even on a physical machine but a vmware virtual machine with a single cpu that reads data using DIH from a database. The machine has no phisical disks attached but stores data on a netapp nas.
Currently this machine indexes 320 documents/sec, not bad but we plan to double the index and we would like to keep nearly the same. Doing some basic checks during the indexing I have found with iostat that the usage of the disks is nearly 8% and the source database is running fine, instead the virtual cpu is 95% running on solr. Now I can quite easily add another virtual cpu to the solr box, but as far as I know this won't help because DIH doesn't work in parallel. Am I wrong? What would you do? Rewrite the feeding process quitting dih and using solrj to feed data in parallel? Would you instead keep DIH and switch to a sharded configuration? Thank you for any hints Giovanni