Hey guys, I'm using the MergeReduceIndexerTool to import data into a SolrCloud cluster made out of 3 decent machines. Looking in the JobTracker, I can see that the mapper jobs finish quite fast. The reduce jobs get to ~80% quite fast as well. It is here where they get stucked for a long period of time (picture + log attached). I'm only trying to insert ~80k documents with 10-50 different fields each. Why is this happening? Am I not setting something correctly? Is the fact that most of the documents have different field names, or too many for that matter? Any tips are gladly appreciated.
Thanks, Costi >From the reduce logs: 60208 [main] INFO org.apache.solr.update.UpdateHandler - start commit{,optimize=false,openSearcher=true,waitSearcher=false,expungeDeletes=false,softCommit=false,prepareCommit=false} 60208 [main] INFO org.apache.solr.update.LoggingInfoStream - [IW][main]: commit: start 60208 [main] INFO org.apache.solr.update.LoggingInfoStream - [IW][main]: commit: enter lock 60208 [main] INFO org.apache.solr.update.LoggingInfoStream - [IW][main]: commit: now prepare 60208 [main] INFO org.apache.solr.update.LoggingInfoStream - [IW][main]: prepareCommit: flush 60208 [main] INFO org.apache.solr.update.LoggingInfoStream - [IW][main]: index before flush 60208 [main] INFO org.apache.solr.update.LoggingInfoStream - [DW][main]: main startFullFlush 60208 [main] INFO org.apache.solr.update.LoggingInfoStream - [DW][main]: anyChanges? numDocsInRam=25603 deletes=true hasTickets:false pendingChangesInFullFlush: false 60209 [main] INFO org.apache.solr.update.LoggingInfoStream - [DWFC][main]: addFlushableState DocumentsWriterPerThread [pendingDeletes=gen=0 25602 deleted terms (unique count=25602) bytesUsed=5171604, segment=_0, aborting=false, numDocsInRAM=25603, deleteQueue=DWDQ: [ generation: 0 ]] 61542 [main] INFO org.apache.solr.update.LoggingInfoStream - [DWPT][main]: flush postings as segment _0 numDocs=25603 61664 [Thread-32] INFO org.apache.solr.hadoop.HeartBeater - Issuing heart beat for 1 threads 125115 [Thread-32] INFO org.apache.solr.hadoop.HeartBeater - Issuing heart beat for 1 threads 199408 [Thread-32] INFO org.apache.solr.hadoop.HeartBeater - Issuing heart beat for 1 threads 271088 [Thread-32] INFO org.apache.solr.hadoop.HeartBeater - Issuing heart beat for 1 threads 336754 [Thread-32] INFO org.apache.solr.hadoop.HeartBeater - Issuing heart beat for 1 threads 417810 [Thread-32] INFO org.apache.solr.hadoop.HeartBeater - Issuing heart beat for 1 threads 479495 [Thread-32] INFO org.apache.solr.hadoop.HeartBeater - Issuing heart beat for 1 threads 552357 [Thread-32] INFO org.apache.solr.hadoop.HeartBeater - Issuing heart beat for 1 threads 621450 [Thread-32] INFO org.apache.solr.hadoop.HeartBeater - Issuing heart beat for 1 threads 683173 [Thread-32] INFO org.apache.solr.hadoop.HeartBeater - Issuing heart beat for 1 threads This is the run command I'm using: hadoop jar /opt/cloudera/parcels/CDH/lib/solr/contrib/mr/search-mr-*-job.jar org.apache.solr.hadoop.MapReduceIndexerTool \ --log4j /home/cmuraru/solr/log4j.properties \ --morphline-file morphline.conf \ --output-dir hdfs://nameservice1:8020/tmp/outdir \ --verbose --go-live --zk-host localhost:2181/solr \ --collection collection1 \ hdfs://nameservice1:8020/tmp/indir