scott chu <scott....@udngroup.com> wrote: > I keep forgeting to mention one thing along the discussion session. > Our data is Chinese news articles and we use CJK tokenizer > (i.e. 2-gram) currently. The time spent to indexing is quite slow, > compared to indexing english articles. That's why I am so > worrying about indexing performance on 10M Chinese docs > and turn to study SolrCloud.
The performance problem is indexing and not searching? Solr supports concurrent indexing, so if you are able to send the data in parallel, just start as many indexing threads as you have cores. Of course that does not help if you are already doing that. Also sanity check that you are not doing commits all the time. - Toke Eskildsen