MergeReduceIndexerTool takes a lot of time for a limited number of documents

Costi Muraru Mon, 26 May 2014 08:52:07 -0700

Hey guys,

I'm using the MergeReduceIndexerTool to import data into a SolrCloud
cluster made out of 3 decent machines.
Looking in the JobTracker, I can see that the mapper jobs finish quite
fast. The reduce jobs get to ~80% quite fast as well. It is here where
they get stucked for a long period of time (picture + log attached).
I'm only trying to insert ~80k documents with 10-50 different fields
each. Why is this happening? Am I not setting something correctly? Is
the fact that most of the documents have different field names, or too
many for that matter?
Any tips are gladly appreciated.


Thanks,
Costi

>From the reduce logs:
60208 [main] INFO  org.apache.solr.update.UpdateHandler  - start
commit{,optimize=false,openSearcher=true,waitSearcher=false,expungeDeletes=false,softCommit=false,prepareCommit=false}
60208 [main] INFO  org.apache.solr.update.LoggingInfoStream  -
[IW][main]: commit: start
60208 [main] INFO  org.apache.solr.update.LoggingInfoStream  -
[IW][main]: commit: enter lock
60208 [main] INFO  org.apache.solr.update.LoggingInfoStream  -
[IW][main]: commit: now prepare
60208 [main] INFO  org.apache.solr.update.LoggingInfoStream  -
[IW][main]: prepareCommit: flush
60208 [main] INFO  org.apache.solr.update.LoggingInfoStream  -
[IW][main]:   index before flush
60208 [main] INFO  org.apache.solr.update.LoggingInfoStream  -
[DW][main]: main startFullFlush
60208 [main] INFO  org.apache.solr.update.LoggingInfoStream  -
[DW][main]: anyChanges? numDocsInRam=25603 deletes=true
hasTickets:false pendingChangesInFullFlush: false
60209 [main] INFO  org.apache.solr.update.LoggingInfoStream  -
[DWFC][main]: addFlushableState DocumentsWriterPerThread
[pendingDeletes=gen=0 25602 deleted terms (unique count=25602)
bytesUsed=5171604, segment=_0, aborting=false, numDocsInRAM=25603,
deleteQueue=DWDQ: [ generation: 0 ]]
61542 [main] INFO  org.apache.solr.update.LoggingInfoStream  -
[DWPT][main]: flush postings as segment _0 numDocs=25603
61664 [Thread-32] INFO  org.apache.solr.hadoop.HeartBeater  - Issuing
heart beat for 1 threads
125115 [Thread-32] INFO  org.apache.solr.hadoop.HeartBeater  - Issuing
heart beat for 1 threads
199408 [Thread-32] INFO  org.apache.solr.hadoop.HeartBeater  - Issuing
heart beat for 1 threads
271088 [Thread-32] INFO  org.apache.solr.hadoop.HeartBeater  - Issuing
heart beat for 1 threads
336754 [Thread-32] INFO  org.apache.solr.hadoop.HeartBeater  - Issuing
heart beat for 1 threads
417810 [Thread-32] INFO  org.apache.solr.hadoop.HeartBeater  - Issuing
heart beat for 1 threads
479495 [Thread-32] INFO  org.apache.solr.hadoop.HeartBeater  - Issuing
heart beat for 1 threads
552357 [Thread-32] INFO  org.apache.solr.hadoop.HeartBeater  - Issuing
heart beat for 1 threads
621450 [Thread-32] INFO  org.apache.solr.hadoop.HeartBeater  - Issuing
heart beat for 1 threads
683173 [Thread-32] INFO  org.apache.solr.hadoop.HeartBeater  - Issuing
heart beat for 1 threads

This is the run command I'm using:
hadoop jar /opt/cloudera/parcels/CDH/lib/solr/contrib/mr/search-mr-*-job.jar
org.apache.solr.hadoop.MapReduceIndexerTool \
 --log4j /home/cmuraru/solr/log4j.properties \
 --morphline-file morphline.conf \
 --output-dir hdfs://nameservice1:8020/tmp/outdir \
 --verbose --go-live --zk-host localhost:2181/solr \
 --collection collection1 \
hdfs://nameservice1:8020/tmp/indir

MergeReduceIndexerTool takes a lot of time for a limited number of documents

Reply via email to