Re: indexing cpu utilization

Tanguy Moal Thu, 08 Mar 2012 05:36:06 -0800

How are you sending documents to solr ?

If you push solr input documents via HTTP (which is what SolrJ does),you could increase CPU consumption (and therefor reduce indexing time)by sending your update requests asynchronously, using multiple updatingthreads, to your single solr core.

Somebody more familiar than me with the Update chain could probably tellyou more, but I think each update request is treated inside a singlethread on the server side.If that's correct, then you can increase CPU consumption on yourindexing host by adding more updating threads (to the client pushingdocuments to your solr core)

Also make sure you don't ask solr to commit your pending changes to solrindex too frequently (on each add), but only when you want changes to betaken into account on the searching side.

I personnaly like to let solr do autoCommits, using a combo of max addeddocuments and elapsed time conditions for the auto commit policy.

Considering indexing bottlenecks more generally, my experience in thatfield, is that indexing speed is usually bound to, in frequency order :- source enumeration speed (especially if solr input documents are madeout of complex joins on a remote DB)- Network IO if performing remote indexing and the network link isn'tadapted to amount of data running through it- Disk IO if you commit very often and rely on commodity SATAs HDDs, orif another process is stressing the poor little device (keep the 150IOPS limit in mind for sata devices)

- CPU if were able to get rid of previous bottlenecks

- Memory isn't playing the same role in indexing speed than otherfactors, because from my point of view it would only be a limit if youperform complex analysis on many many fields, and if that becomes aproblem, then it becomes easy to spot with JMX and JConsole because yourJVM would then be performing many GCs, and the process's resident RAMusage will be close to whatever was set to -Xmx .

I don't know if I was really clear, all I can say is that increasing thenumber of clients pushing updates to solr in parrallel was the easiestfor me to reduce the indexing time for large update batches.


Hope this helps,

--
Tanguy

Le 08/03/2012 11:48, gabriel shen a écrit :

Our indexing process is to adding a bundle of solr documents(for example
5000) to solr each time, and we observed that before commiting(which might
be io bounded) it uses less than half the CPU capacity constantly, which
sounds strange to us why it doesn't use full cpu power. As for RAM, I don't
know how much it will affect CPU utilization, we have assigned 14gb to the
solr tomcat server on a 32 gb linux machine.

best regards,
shen
On Thu, Mar 8, 2012 at 11:27 AM, Gora Mohanty<g...@mimirtech.com>  wrote:

On 8 March 2012 15:39, gabriel shen<xshco...@gmail.com>  wrote:

Hi,

I noticed that, sequential indexing on 1 solr core is only using 40% of

our

8 virtual core CPU power. Why isn't it use 100% of the power? Is there a
way to increase CPU utilization rate?

[...]

This is an open-ended question which could be due to a
variety of things, and also depends on how you are indexing.
Your indexing process might be I/O bounded (quite possible),
or memory bounded, rather than CPU bounded.

Regards,
Gora

Re: indexing cpu utilization

Reply via email to