I will run update index once a day.

Regards,
Abhishek

------Original Message------
From: Jan Høydahl / Cominvent
To: solr-user@lucene.apache.org
ReplyTo: solr-user@lucene.apache.org
Subject: Re: Posting Concurrently to Solr
Sent: Feb 11, 2010 22:17

You did not say how frequent you need to update the index, if this is batch 
type of operation or if you also have some real-time requirements after the 
initial load.

Your ETL could use SolrJ and the StreamingUpdateSolrServer for high throughput.
You could try multiple threads pushing in parallell if your bottleneck is on 
the client side.
If that's not enough you can split your index into multiple cores/shards to get 
more parallell indexing power.
You don't need to merge them at the end, you can query using the shards 
parameter.

For extreme power for batch indexing, you can look at a map-reduce strategy: 
http://wiki.apache.org/solr/HadoopIndexing

--
Jan Høydahl  - search architect
Cominvent AS - www.cominvent.com

On 11. feb. 2010, at 11.33, abhishes wrote:

> 
> Hello Everyone,
> 
> If I have a large data set which needs to be indexed, what strategy I can
> take to build the index fast?
> 
> 1. split the input into multiple xml files and then open different shells
> and post each of the split xml file? will this work and help me build index
> faster than 1 large xml file?
> 
> 2. What if I don't want to build the XML files at all. I want to write the
> extraction logic in an ETL tool and then let the ETL tool send the command
> to SOLR. then I run my ETL tool in a multi-threaded manner where each thread
> is extracting the data from the backed and send it to Solr for indexing.
> 
> 3. Use the Core Feature and then populate each core separately, then merge
> the cores.
> 
> Any other approach?
> 
> 
> 
> -- 
> View this message in context: 
> http://old.nabble.com/Posting-Concurrently-to-Solr-tp27544311p27544311.html
> Sent from the Solr - User mailing list archive at Nabble.com.
> 



Sent from BlackBerry® on Airtel

Reply via email to