This is a huge red flag to me: "(but I could only test for the first few 
thousand documents”

You’re probably right that that would speed things up, but pretty soon when 
you’re indexing
your entire corpus there are lots of other considerations.

The indexing rate you’re seeing is abysmal unless these are _huge_ documents, 
but you
indicate that at the start you’re getting 1,400 docs/second so I don’t think 
the complexity
of the docs is the issue here.

Do note that when we’re throwing RAM figures out, we need to draw a sharp 
distinction
between Java heap and total RAM. Some data is held on the heap and some in the 
OS
RAM due to MMapDirectory, see Uwe’s excellent article:
https://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

Uwe recommends about 25% of your available physical RAM be allocated to Java as
a starting point. Your particular Solr installation may need a larger percent, 
IDK.

But basically I’d go back to all default settings and change one thing at a 
time.
First, I’d look at GC performance. Is it taking all your CPU? In which case you 
probably need to 
increase your heap. I pick this first because it’s very common that this is a 
root cause.

Next, I’d put a profiler on it to see exactly where I’m spending time. 
Otherwise you wind
up making random changes and hoping one of them works.

Best,
Erick

> On Dec 4, 2019, at 3:21 AM, Paras Lehana <paras.leh...@indiamart.com> wrote:
> 
> (but I could only test for the first few
> thousand documents

Reply via email to