Re: Solr Performance

Jack Krupansky Thu, 24 May 2012 09:51:23 -0700

I vaguely recall some thread blocking issue with trying to parse too manyPDF files at one time in the same JVM.

Occasionally Tika (actually PDFBox) has been known to hang for some PDFdocs.

Do you have enough memory in the JVM? When the CPU is busy, is there muchmemory available in the JVM? Maybe garbage collection is taking too much ofthe CPU.


-- Jack Krupansky

-----Original Message-----From: chris.a.mattm...@jpl.nasa.gov

Sent: Thursday, May 24, 2012 9:55 AM
To: solr-user@lucene.apache.org
Subject: Solr Performance

Hi Chris

First of all,thanks lot that your earlier inputs for my document indexing
failures helped me a lot!

Now I am facing few performance issues with the indexing.
This is what I am doing-

- Read data from an excel sheet which essentially contains the path of thePDF

file to be indexed and few literals that I have to add to the Solr Update
request which i can use as filter query to solr when I am
searching.[Category$Subcategory$pathTotheFile]

- My input sheet data may vary from few thousands to upto 6 million lines.

- I am making a Set from these lines and deviding it into 4 chunks andspawning

4 threads which will prepare the Solr ContentStreamUpdateRequest request and
post it to solr.

- In this process I have these issues ::

1. My system's cpu touches a high percentile and the indexing is aborted.

2. If I have a "setAutoCommitWithin" it doesn't work (meaning that initiallyI

can find few documents committed,after that nothing happens)

3.I have used StreamingUpdateSolrServer with quesize 20, and thread count of4.


4.My main aim is to boost up the indexing rate [speed].

Can you suggest where and all I can tweak my routine?

Thanks in advace...

Surendra.

Re: Solr Performance

Reply via email to