[SoldCloud] Slow indexing

Markus Jelsma Sun, 04 Mar 2012 12:19:05 -0800

Hi,

With auto-committing disabled we can now index many millions ofdocuments in our test environment on a 5-node cluster with 5 shards anda replication factor of 2. The documents are uploaded from map/reduce.No significant changes were made to solrconfig and there are no updateprocessors enabled. We are using a trunk revision from this weekend.

The indexing speed is well below what we are used to see, we can easilyindex 5 millions documents on a non-cloud enabled Solr 3.x instancewithin an hour. What could be going on? There aren't many open TCPconnections and the number of file descriptors is stable and I/O is lowbut CPU-time is high! Each node has two Solr cores both writing to theirdedicated disk.

The indexing speed is stable, it was slow at start and still is. It'snow running for well over 6 hours and only 3.5 millions documents areindexed. Another strange detail is that the node receiving all incomingdocuments (we're not yet using a client side Solr server pool) has amuch larger disk usage than all other nodes. This is peculiar as weexpected all replica's to be a about the same size.

The receiving node has slightly higher CPU than the other nodes but thethread dump shows a very large amount of threads of typecmdDistribExecutor-8-thread-292260 (295090) with 0-100ms CPU-time. Atthe top of the list these threads all have < 20ms time but near thebottom it rises to just over 100ms. All nodes have a couple ofhttp-80-30 (121994) threads with very high CPU-time each.


Is this a known issue? Did i miss something? Any ideas?

Thanks

[SoldCloud] Slow indexing

Reply via email to