This is a huge red flag to me: "(but I could only test for the first few thousand documents”
You’re probably right that that would speed things up, but pretty soon when you’re indexing your entire corpus there are lots of other considerations. The indexing rate you’re seeing is abysmal unless these are _huge_ documents, but you indicate that at the start you’re getting 1,400 docs/second so I don’t think the complexity of the docs is the issue here. Do note that when we’re throwing RAM figures out, we need to draw a sharp distinction between Java heap and total RAM. Some data is held on the heap and some in the OS RAM due to MMapDirectory, see Uwe’s excellent article: https://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html Uwe recommends about 25% of your available physical RAM be allocated to Java as a starting point. Your particular Solr installation may need a larger percent, IDK. But basically I’d go back to all default settings and change one thing at a time. First, I’d look at GC performance. Is it taking all your CPU? In which case you probably need to increase your heap. I pick this first because it’s very common that this is a root cause. Next, I’d put a profiler on it to see exactly where I’m spending time. Otherwise you wind up making random changes and hoping one of them works. Best, Erick > On Dec 4, 2019, at 3:21 AM, Paras Lehana <paras.leh...@indiamart.com> wrote: > > (but I could only test for the first few > thousand documents