re: optimize after every import.... This is not recommended in 4.x unless and until you have evidence that it really does help, reviews are very mixed, and it's been renamed force merge in 4.x just so people don't think "Of course I want to do this, who wouldn't?".
bq: Doing a commit instead of optimize is usually bringing the master and slave nodes down This isn't expected unless you're committing far too frequently. I'd dis-recommend doing any commits except, possibly, a single commit after all my clients had finished indexing. But even that isn't necessary. In batch modes in SolrCloud, reasonable setups are autocommit: 15 seconds WITH openSearcher="false" autosoftcommit: the interval it takes you to run all your indexing. Seems odd, but here's the backtround: http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ Best, Erick On Thu, May 1, 2014 at 11:12 PM, Alexander Kanarsky <kanarsky2...@gmail.com> wrote: > If you build your index in Hadoop, read this (it is about the Cloudera > Search but in my understanding also should work with Solr Hadoop contrib > since 4.7) > http://www.cloudera.com/content/cloudera-content/cloudera-docs/Search/latest/Cloudera-Search-User-Guide/csug_batch_index_to_solr_servers_using_golive.html > > > On Thu, May 1, 2014 at 1:47 PM, Costi Muraru <costimur...@gmail.com> wrote: > >> Hi guys, >> >> What would you say it's the fastest way to import data in SolrCloud? >> Our use case: each day do a single import of a big number of documents. >> >> Should we use SolrJ/DataImportHandler/other? Or perhaps is there a bulk >> import feature in SOLR? I came upon this promising link: >> http://wiki.apache.org/solr/UpdateCSV >> Any idea on how UpdateCSV is performance-wise compared with >> SolrJ/DataImportHandler? >> >> If SolrJ, should we split the data in chunks and start multiple clients at >> once? In this way we could perhaps take advantage of the multitude number >> of servers in the SolrCloud configuration? >> >> Either way, after the import is finished, should we do an optimize or a >> commit or none ( >> http://wiki.solarium-project.org/index.php/V1:Optimize_command)? >> >> Any tips and tricks to perform this process the right way are gladly >> appreciated. >> >> Thanks, >> Costi >>