I am no expert, but here is my take and our situation.

Firstly, are you asking what the minimum number of documents is before it makes 
*any* sense at all to use a distributed search, or are you asking what the 
maximum number of documents is before a distributed search is essentially 
required?  The answers would be different.  I get the feeling you are asking 
the second question, so I'll proceed under that assumption.

I expect that in part the answer is "it depends".  I expect that it is mostly a 
function of the size of the index (and the interaction between that and memory 
and search performance), which depends on both the number of documents and how 
much is stored for the documents.  It also would depend upon your update load.

If the documents are small and/or the amount of stuff you store per document is 
small , then until the number of documents and/or updates gets truly enormous a 
single machine will probably be fine.

But, if your documents (the amount stored per document) is very large, then at 
some point the index files get so large that performance on a single machine 
isn't adequate.  Alternatively, if your update load is very very large, you 
might need to spread out that load among multiple servers to handle the update 
load without crippling your ability to respond to queries.

As for a specific instance, we have a single index of 7 Million (going on 28 
Million), with maybe 512 bytes of data stored for each document, with maybe 26 
or so indexed fields (we have a *lot* of copyField operations in order to index 
the data the way we want it, yet preserve the original data to return), and did 
not need to use distributed search.

JRJ

-----Original Message-----
From: Pengkai Qin [mailto:qin19890...@163.com] 
Sent: Thursday, September 29, 2011 5:15 AM
To: solr-user@lucene.apache.org; d...@lucene.apache.org
Subject: About solr distributed search

Hi all,

Now I'm doing research on solr distributed search, and it is said documents 
more than one million is reasonable to use distributed search.
So I want to know, does anyone have the test result(Such as time cost) of using 
single index and distributed search of more than one million data? I need the 
test result very urgent, thanks in advance!

Best Regards,
Pengkai


Reply via email to