30 million - that's feasible on a single (beefy) Solr server but whether
it's advisable to go distributed or not depends on other factors, like query
speed issues you may have with that many docs in a single server, expected
collection growth, and so on.
As for your questions further below
A quick add on to this -- we have over 30 million documents.
I take it that we should be looking @ Distributed Solr?
as in
http://www.lucidimagination.com/content/scaling-lucene-and-solr#d0e344
Thanks.
On Mon, Feb 27, 2012 at 2:33 PM, Memory Makers wrote:
> Many thanks for the response.
>
> H
Many thanks for the response.
Here is the revised questions:
For example if I have N processes that are producing documents to index:
1. Should I have them simultaneously submit documents to Solr (will this
improve the indexing throughput)?
2. Is there anything I can do Solr configuration wise th
My two cents:
- pulling is better than pushing -
http://wiki.apache.org/solr/Solrj#Streaming_documents_for_an_update
- DIH is not thread safe https://issues.apache.org/jira/browse/SOLR-3011 But
there are few patches for trunk which fix it.
Regards
On Mon, Feb 27, 2012 at 10:46 PM, Erik Hatcher
Yes, absolutely. Parallelizing indexing can make a huge difference. How you
do so will depend on your indexing environment. Most crudely, running multiple
indexing scripts on different subsets of data up to the the limitations of your
operating system and hardware is how many do it. SolrJ h
Thanks for the quick reply.
the box has 8 real cpu's. Perhaps a good idea then to reduce the nr of cores
to 8 as well. I'm testing out a different scenario with multiple boxes as
well, where clients persist docs to multiple cores on multiple boxes. (which
is what multicore was invented for after
Britske,
Here are a few quick ones:
- Does that machine really have 10 CPU cores? If it has significantly less,
you may be beyond the "indexing sweet spot" in terms of indexer threads vs. CPU
cores
- Your maxBufferedDocs is super small. Comment that out anyway. use
ramBufferedSizeMB and s