Forgot to attach server and solr configurations:

SolrCloud 4.1, internal Zookeeper, 16 shards, custom java importer.
Server: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz, 32 cores, 192gb RAM, 10tb
SSD and 50tb SAS memory


On Thu, Jul 25, 2013 at 3:20 PM, Radu Ghita <r...@wmds.ro> wrote:

>
> Hi,
>
> We are having a client with business model that requires indexing each
> month billion rows into solr from mysql in a small time-frame. The
> documents are very light, but the number is very high and we need to
> achieve speeds of around 80-100k/s. The built in solr indexer goes to
> 40-50k tops, but after some hours ( ~12h ) it crashes and the speed slows
> down as hours go by.
>
> Therefore we have developed a custom java importer that connects directly
> to mysql and solrcloud via zookeeper, grabs data from mysql, creates
> documents and then imports into solr. This helps because we are opening ~50
> threads and the indexing process speeds up. We have optimized the mysql
> queries ( mysql was the initial bottleneck ) and the speeds we get now are
> over 100k/s, but as index number gets bigger, solr stays very long on
> adding documents. I assume it needs to be something from solrconfig that
> makes solr stay and even block after 100 mil documents indexed.
>
> Here is the java code that creates documents and then adds to solr server:
>
> public void createDocuments() throws SQLException, SolrServerException,
> IOException
> {
> App.logger.write("Creating documents..");
> this.docs = new ArrayList<SolrInputDocument>();
> App.logger.incrementNumberOfRows(this.size);
> while(this.results.next())
> { this.docs.add(this.getDocumentFromResultSet(this.results)); }
>
> this.statement.close();
> this.results.close();
> }
>
> public void commitDocuments() throws SolrServerException, IOException
> { App.logger.write("Committing.."); App.solrServer.add(this.docs); // here
> it stays very long and then blocks
> App.logger.incrementNumberOfRows(this.docs.size()); this.docs.clear(); }
>
> I am also pasting solrconfig.xml parameters that make sense to this
> discussion:
> <maxIndexingThreads>128</maxIndexingThreads>
> <useCompoundFile>false</useCompoundFile>
> <ramBufferSizeMB>10000</ramBufferSizeMB>
> <maxBufferedDocs>1000000</maxBufferedDocs>
> <mergePolicy class="org.apache.lucene.index.TieredMergePolicy">
> <int name="maxMergeAtOnce">20000</int>
> <int name="segmentsPerTier">1000000</int>
> <int name="maxMergeAtOnceExplicit">10000</int>
> </mergePolicy>
> <mergeFactor>100</mergeFactor>
> <termIndexInterval>1024</termIndexInterval>
> <autoCommit>
> <maxTime>15000</maxTime>
> <maxDocs>1000000</maxDocs>
> <openSearcher>false</openSearcher>
> </autoCommit>
> <autoSoftCommit>
> <maxTime>2000000</maxTime>
> </autoSoftCommit>
>
> The big problem stands in SOLR, because I've run the mysql queries single
> and speed is great, but as the time passes solr adding function stays way
> too long and then it blocks, even tho server is top level and has lots of
> resources.
>
> I'm new to this so please assist. Thanks,
> --
>
> **
>
>   *Radu Ghita *--------------------------------
>
>   Tel:   +40 721 18 18 68
>
>   Fax:  +40 351 81 85 52
>



-- 

**

  *Radu Ghita *--------------------------------

  Tel:   +40 721 18 18 68

  Fax:  +40 351 81 85 52

Reply via email to