How are you sending documents to solr ?
If you push solr input documents via HTTP (which is what SolrJ does),
you could increase CPU consumption (and therefor reduce indexing time)
by sending your update requests asynchronously, using multiple updating
threads, to your single solr core.
Somebody more familiar than me with the Update chain could probably tell
you more, but I think each update request is treated inside a single
thread on the server side.
If that's correct, then you can increase CPU consumption on your
indexing host by adding more updating threads (to the client pushing
documents to your solr core)
Also make sure you don't ask solr to commit your pending changes to solr
index too frequently (on each add), but only when you want changes to be
taken into account on the searching side.
I personnaly like to let solr do autoCommits, using a combo of max added
documents and elapsed time conditions for the auto commit policy.
Considering indexing bottlenecks more generally, my experience in that
field, is that indexing speed is usually bound to, in frequency order :
- source enumeration speed (especially if solr input documents are made
out of complex joins on a remote DB)
- Network IO if performing remote indexing and the network link isn't
adapted to amount of data running through it
- Disk IO if you commit very often and rely on commodity SATAs HDDs, or
if another process is stressing the poor little device (keep the 150
IOPS limit in mind for sata devices)
- CPU if were able to get rid of previous bottlenecks
- Memory isn't playing the same role in indexing speed than other
factors, because from my point of view it would only be a limit if you
perform complex analysis on many many fields, and if that becomes a
problem, then it becomes easy to spot with JMX and JConsole because your
JVM would then be performing many GCs, and the process's resident RAM
usage will be close to whatever was set to -Xmx .
I don't know if I was really clear, all I can say is that increasing the
number of clients pushing updates to solr in parrallel was the easiest
for me to reduce the indexing time for large update batches.
Hope this helps,
--
Tanguy
Le 08/03/2012 11:48, gabriel shen a écrit :
Our indexing process is to adding a bundle of solr documents(for example
5000) to solr each time, and we observed that before commiting(which might
be io bounded) it uses less than half the CPU capacity constantly, which
sounds strange to us why it doesn't use full cpu power. As for RAM, I don't
know how much it will affect CPU utilization, we have assigned 14gb to the
solr tomcat server on a 32 gb linux machine.
best regards,
shen
On Thu, Mar 8, 2012 at 11:27 AM, Gora Mohanty<g...@mimirtech.com> wrote:
On 8 March 2012 15:39, gabriel shen<xshco...@gmail.com> wrote:
Hi,
I noticed that, sequential indexing on 1 solr core is only using 40% of
our
8 virtual core CPU power. Why isn't it use 100% of the power? Is there a
way to increase CPU utilization rate?
[...]
This is an open-ended question which could be due to a
variety of things, and also depends on how you are indexing.
Your indexing process might be I/O bounded (quite possible),
or memory bounded, rather than CPU bounded.
Regards,
Gora