Re: Speeding up indexing

2012-02-28 Thread Erik Hatcher
30 million - that's feasible on a single (beefy) Solr server but whether it's advisable to go distributed or not depends on other factors, like query speed issues you may have with that many docs in a single server, expected collection growth, and so on. As for your questions further below

Re: Speeding up indexing

2012-02-27 Thread Memory Makers
A quick add on to this -- we have over 30 million documents. I take it that we should be looking @ Distributed Solr? as in http://www.lucidimagination.com/content/scaling-lucene-and-solr#d0e344 Thanks. On Mon, Feb 27, 2012 at 2:33 PM, Memory Makers wrote: > Many thanks for the response. > > H

Re: Speeding up indexing

2012-02-27 Thread Memory Makers
Many thanks for the response. Here is the revised questions: For example if I have N processes that are producing documents to index: 1. Should I have them simultaneously submit documents to Solr (will this improve the indexing throughput)? 2. Is there anything I can do Solr configuration wise th

Re: Speeding up indexing

2012-02-27 Thread Mikhail Khludnev
My two cents: - pulling is better than pushing - http://wiki.apache.org/solr/Solrj#Streaming_documents_for_an_update - DIH is not thread safe https://issues.apache.org/jira/browse/SOLR-3011 But there are few patches for trunk which fix it. Regards On Mon, Feb 27, 2012 at 10:46 PM, Erik Hatcher

Re: Speeding up indexing

2012-02-27 Thread Erik Hatcher
Yes, absolutely. Parallelizing indexing can make a huge difference. How you do so will depend on your indexing environment. Most crudely, running multiple indexing scripts on different subsets of data up to the the limitations of your operating system and hardware is how many do it. SolrJ h

Re: speeding up indexing with a LOT of indexed fields

2009-03-25 Thread Britske
Thanks for the quick reply. the box has 8 real cpu's. Perhaps a good idea then to reduce the nr of cores to 8 as well. I'm testing out a different scenario with multiple boxes as well, where clients persist docs to multiple cores on multiple boxes. (which is what multicore was invented for after

Re: speeding up indexing with a LOT of indexed fields

2009-03-25 Thread Otis Gospodnetic
Britske, Here are a few quick ones: - Does that machine really have 10 CPU cores? If it has significantly less, you may be beyond the "indexing sweet spot" in terms of indexer threads vs. CPU cores - Your maxBufferedDocs is super small. Comment that out anyway. use ramBufferedSizeMB and s