I don't seem to be seeing a signifigant slowdown over time when I use the old 
defaults for merge threads and max merges.

- Mark

On Jul 25, 2013, at 10:17 AM, Mark Miller <markrmil...@gmail.com> wrote:

> I'm looking into some possible slow down after long indexing issues when I 
> get back from vacation. This could be related. Very early guess though.
> 
> Another thing you might try - Lucene recently changed the merge scheduler 
> policy defaults (in 4.1) - it used to use up 3 threads to merge and have a 
> max merge setting of that + 2. It now defaults to 1 and 2, and that can 
> really impact how fast documents are added by a significant amount. It also 
> causes indexing threads to pause and wait for merges *way* more, especially 
> when your index gets large and the merges start taking a long time. The 
> tradeoff was supposedly that merges are faster, but honestly, I think it's a 
> poor default, especially if you are measuring indexing speed and now really 
> paying attention to how long merges go on afar you finish indexing, and 
> especially if you have beefy hardware. You might play with those settings.
> 
> - Mark
> 
> On Jul 25, 2013, at 8:36 AM, Radu Ghita <r...@wmds.ro> wrote:
> 
>> Forgot to attach server and solr configurations:
>> 
>> SolrCloud 4.1, internal Zookeeper, 16 shards, custom java importer.
>> Server: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz, 32 cores, 192gb RAM, 10tb
>> SSD and 50tb SAS memory
>> 
>> 
>> On Thu, Jul 25, 2013 at 3:20 PM, Radu Ghita <r...@wmds.ro> wrote:
>> 
>>> 
>>> Hi,
>>> 
>>> We are having a client with business model that requires indexing each
>>> month billion rows into solr from mysql in a small time-frame. The
>>> documents are very light, but the number is very high and we need to
>>> achieve speeds of around 80-100k/s. The built in solr indexer goes to
>>> 40-50k tops, but after some hours ( ~12h ) it crashes and the speed slows
>>> down as hours go by.
>>> 
>>> Therefore we have developed a custom java importer that connects directly
>>> to mysql and solrcloud via zookeeper, grabs data from mysql, creates
>>> documents and then imports into solr. This helps because we are opening ~50
>>> threads and the indexing process speeds up. We have optimized the mysql
>>> queries ( mysql was the initial bottleneck ) and the speeds we get now are
>>> over 100k/s, but as index number gets bigger, solr stays very long on
>>> adding documents. I assume it needs to be something from solrconfig that
>>> makes solr stay and even block after 100 mil documents indexed.
>>> 
>>> Here is the java code that creates documents and then adds to solr server:
>>> 
>>> public void createDocuments() throws SQLException, SolrServerException,
>>> IOException
>>> {
>>> App.logger.write("Creating documents..");
>>> this.docs = new ArrayList<SolrInputDocument>();
>>> App.logger.incrementNumberOfRows(this.size);
>>> while(this.results.next())
>>> { this.docs.add(this.getDocumentFromResultSet(this.results)); }
>>> 
>>> this.statement.close();
>>> this.results.close();
>>> }
>>> 
>>> public void commitDocuments() throws SolrServerException, IOException
>>> { App.logger.write("Committing.."); App.solrServer.add(this.docs); // here
>>> it stays very long and then blocks
>>> App.logger.incrementNumberOfRows(this.docs.size()); this.docs.clear(); }
>>> 
>>> I am also pasting solrconfig.xml parameters that make sense to this
>>> discussion:
>>> <maxIndexingThreads>128</maxIndexingThreads>
>>> <useCompoundFile>false</useCompoundFile>
>>> <ramBufferSizeMB>10000</ramBufferSizeMB>
>>> <maxBufferedDocs>1000000</maxBufferedDocs>
>>> <mergePolicy class="org.apache.lucene.index.TieredMergePolicy">
>>> <int name="maxMergeAtOnce">20000</int>
>>> <int name="segmentsPerTier">1000000</int>
>>> <int name="maxMergeAtOnceExplicit">10000</int>
>>> </mergePolicy>
>>> <mergeFactor>100</mergeFactor>
>>> <termIndexInterval>1024</termIndexInterval>
>>> <autoCommit>
>>> <maxTime>15000</maxTime>
>>> <maxDocs>1000000</maxDocs>
>>> <openSearcher>false</openSearcher>
>>> </autoCommit>
>>> <autoSoftCommit>
>>> <maxTime>2000000</maxTime>
>>> </autoSoftCommit>
>>> 
>>> The big problem stands in SOLR, because I've run the mysql queries single
>>> and speed is great, but as the time passes solr adding function stays way
>>> too long and then it blocks, even tho server is top level and has lots of
>>> resources.
>>> 
>>> I'm new to this so please assist. Thanks,
>>> --
>>> 
>>> **
>>> 
>>> *Radu Ghita *--------------------------------
>>> 
>>> Tel:   +40 721 18 18 68
>>> 
>>> Fax:  +40 351 81 85 52
>>> 
>> 
>> 
>> 
>> -- 
>> 
>> **
>> 
>> *Radu Ghita *--------------------------------
>> 
>> Tel:   +40 721 18 18 68
>> 
>> Fax:  +40 351 81 85 52
> 

Reply via email to