Auto soft commit is great for real time access, but you need to do hard commits periodically or else the transaction log (which is what assures that soft commits are durable) gets too big - it needs to be replayed on startup and is used for real-time search.

So, set the auto soft commit to the currency of updates that you need on search. Then set hard commit to something like every 10 minutes, 15 minutes, 30 minutes, 1 hour, 2 hours, 4 hours, 8 hours, or whatever makes sense for your application.

Hard auto commit should of course be at a greater interval than auto soft commit.

-- Jack Krupansky

-----Original Message----- From: Radu Ghita
Sent: Thursday, July 25, 2013 8:20 AM
To: solr-user@lucene.apache.org
Subject: SolrCloud commit process is too time consuming, even if documents are light

Hi,

We are having a client with business model that requires indexing each
month billion rows into solr from mysql in a small time-frame. The
documents are very light, but the number is very high and we need to
achieve speeds of around 80-100k/s. The built in solr indexer goes to
40-50k tops, but after some hours ( ~12h ) it crashes and the speed slows
down as hours go by.

Therefore we have developed a custom java importer that connects directly
to mysql and solrcloud via zookeeper, grabs data from mysql, creates
documents and then imports into solr. This helps because we are opening ~50
threads and the indexing process speeds up. We have optimized the mysql
queries ( mysql was the initial bottleneck ) and the speeds we get now are
over 100k/s, but as index number gets bigger, solr stays very long on
adding documents. I assume it needs to be something from solrconfig that
makes solr stay and even block after 100 mil documents indexed.

Here is the java code that creates documents and then adds to solr server:

public void createDocuments() throws SQLException, SolrServerException,
IOException
{
App.logger.write("Creating documents..");
this.docs = new ArrayList<SolrInputDocument>();
App.logger.incrementNumberOfRows(this.size);
while(this.results.next())
{ this.docs.add(this.getDocumentFromResultSet(this.results)); }

this.statement.close();
this.results.close();
}

public void commitDocuments() throws SolrServerException, IOException
{ App.logger.write("Committing.."); App.solrServer.add(this.docs); // here
it stays very long and then blocks
App.logger.incrementNumberOfRows(this.docs.size()); this.docs.clear(); }

I am also pasting solrconfig.xml parameters that make sense to this
discussion:
<maxIndexingThreads>128</maxIndexingThreads>
<useCompoundFile>false</useCompoundFile>
<ramBufferSizeMB>10000</ramBufferSizeMB>
<maxBufferedDocs>1000000</maxBufferedDocs>
<mergePolicy class="org.apache.lucene.index.TieredMergePolicy">
<int name="maxMergeAtOnce">20000</int>
<int name="segmentsPerTier">1000000</int>
<int name="maxMergeAtOnceExplicit">10000</int>
</mergePolicy>
<mergeFactor>100</mergeFactor>
<termIndexInterval>1024</termIndexInterval>
<autoCommit>
<maxTime>15000</maxTime>
<maxDocs>1000000</maxDocs>
<openSearcher>false</openSearcher>
</autoCommit>
<autoSoftCommit>
<maxTime>2000000</maxTime>
</autoSoftCommit>

The big problem stands in SOLR, because I've run the mysql queries single
and speed is great, but as the time passes solr adding function stays way
too long and then it blocks, even tho server is top level and has lots of
resources.

I'm new to this so please assist. Thanks,
--

**

 *Radu Ghita *--------------------------------

 Tel:   +40 721 18 18 68

Fax: +40 351 81 85 52

Reply via email to