DataImportHandler does not parallelize indexing at all. It is a single threaded indexer which runs on a single node. However, the documents themselves are routed to the correct shard by SolrCloud. Therefore, what you are observing on your servers is normal.
If you want to parallelize indexing then you can either: a) Use SolrJ or an external client and write the indexing code yourself, or b) Setup DIH in such a way that each shard indexes a disjoint subset of data. This way, you can fire DIH full import on multiple shard/nodes simultaneously. One way of achieving (b) is by using request parameters to substitute placeholders in your DIH configuration. See http://wiki.apache.org/solr/DataImportHandler#Accessing_request_parameters On Tue, Sep 3, 2013 at 3:25 PM, <jerome.dup...@bnf.fr> wrote: > > Hello again, > > I still trying to index a with solr cloud and dih. I can index but it seems > that indexation is done on only 1 shard. (my goal was to parallelze that to > go fast) > This my conf: > I have 2 tomcat instances, > One with zookeeper embedded in solr 4.4.0 started and 1 shard (port 8080) > The other with the second shard. (port 9180) > In my admin interface, I see 2 shards, each one is leader > > > When I launch the dih, documents are indexed. But only the shard1 is > working. > http://localhost:8080/solr-0.4.0-pfd/noticesBIBcollection/dataimportMNb?command=full-import&entity=noticebib&optimize=true&indent=true&clean=true&commit=true&verbose=false&debug=false&wt=json&rows=1000 > > > In my first shard, I see messages coming from my indexation process: > DEBUG 2013-09-03 11:48:57,801 Thread-12 > org.apache.solr.handler.dataimport.URLDataSource (92) - Accessing URL: > file:/X:/3/7/002/37002118.xml > DEBUG 2013-09-03 11:48:57,832 Thread-12 > org.apache.solr.handler.dataimport.URLDataSource (92) - Accessing URL: > file:/X:/3/7/002/37002120.xml > DEBUG 2013-09-03 11:48:57,966 Thread-12 > org.apache.solr.handler.dataimport.LogTransformer (58) - Notice fichier: > 3/7/002/37002120.xml > DEBUG 2013-09-03 11:48:57,966 Thread-12 fr.bnf.solr.BnfDateTransformer > (696) - NN=37002120 > > In the second instance, I just have this kind of logs, at it was receiving > notifications from zookeeper of new updates > INFO 2013-09-03 11:48:57,323 http-9180-7 > org.apache.solr.update.processor.LogUpdateProcessor (198) - [noticesBIB] > webapp=/solr-0.4.0-pfd path=/update params= > {distrib.from=http://172.20.48.237:8080/solr-0.4.0-pfd/noticesBIB/&update.distrib=TOLEADER&wt=javabin&version=2} > {add=[37001748 (1445149264874307584), 37001757 (1445149264879550464), > 37001764 (1445149264883744768), 37001786 (1445149264887939072), 37001817 > (1445149264891084800), 37001819 (1445149264896327680), 37001837 > (1445149264900521984), 37001861 (1445149264903667712), 37001869 > (1445149264907862016), 37001963 (1445149264912056320)]} 0 41 > > I supposed there was a confusion between cores names and collection name, > and I tried to change the name of the collection, but it solved nothing. > When I come to dih interfaces, in shard1, I see indexation processing, and > on shard 2 "no information available" > > Is there something specia to do to distributre indexation process? > Should I run zookeeper on both instances (even if it's not mandatory? > ... > Regards > Jerome > > > > Fermeture annuelle des sites François-Mitterrand et Richelieu du 2 au 15 > septembre 2013 Avant d'imprimer, pensez à l'environnement. -- Regards, Shalin Shekhar Mangar.