scott chu <scott....@udngroup.com> wrote:
 
> I keep forgeting to mention one thing along the discussion session.
> Our data is Chinese news articles and we use CJK tokenizer
> (i.e. 2-gram) currently. The time spent to indexing is quite slow,
> compared to indexing english articles. That's why I am so
> worrying about indexing performance on 10M Chinese docs
> and turn to study SolrCloud.

The performance problem is indexing and not searching? Solr supports concurrent 
indexing, so if you are able to send the data in parallel, just start as many 
indexing threads as you have cores. Of course that does not help if you are 
already doing that.

Also sanity check that you are not doing commits all the time.

- Toke Eskildsen

Reply via email to