I will run update index once a day. Regards, Abhishek
------Original Message------ From: Jan Høydahl / Cominvent To: solr-user@lucene.apache.org ReplyTo: solr-user@lucene.apache.org Subject: Re: Posting Concurrently to Solr Sent: Feb 11, 2010 22:17 You did not say how frequent you need to update the index, if this is batch type of operation or if you also have some real-time requirements after the initial load. Your ETL could use SolrJ and the StreamingUpdateSolrServer for high throughput. You could try multiple threads pushing in parallell if your bottleneck is on the client side. If that's not enough you can split your index into multiple cores/shards to get more parallell indexing power. You don't need to merge them at the end, you can query using the shards parameter. For extreme power for batch indexing, you can look at a map-reduce strategy: http://wiki.apache.org/solr/HadoopIndexing -- Jan Høydahl - search architect Cominvent AS - www.cominvent.com On 11. feb. 2010, at 11.33, abhishes wrote: > > Hello Everyone, > > If I have a large data set which needs to be indexed, what strategy I can > take to build the index fast? > > 1. split the input into multiple xml files and then open different shells > and post each of the split xml file? will this work and help me build index > faster than 1 large xml file? > > 2. What if I don't want to build the XML files at all. I want to write the > extraction logic in an ETL tool and then let the ETL tool send the command > to SOLR. then I run my ETL tool in a multi-threaded manner where each thread > is extracting the data from the backed and send it to Solr for indexing. > > 3. Use the Core Feature and then populate each core separately, then merge > the cores. > > Any other approach? > > > > -- > View this message in context: > http://old.nabble.com/Posting-Concurrently-to-Solr-tp27544311p27544311.html > Sent from the Solr - User mailing list archive at Nabble.com. > Sent from BlackBerry® on Airtel