I am no expert, but here is my take and our situation. Firstly, are you asking what the minimum number of documents is before it makes *any* sense at all to use a distributed search, or are you asking what the maximum number of documents is before a distributed search is essentially required? The answers would be different. I get the feeling you are asking the second question, so I'll proceed under that assumption.
I expect that in part the answer is "it depends". I expect that it is mostly a function of the size of the index (and the interaction between that and memory and search performance), which depends on both the number of documents and how much is stored for the documents. It also would depend upon your update load. If the documents are small and/or the amount of stuff you store per document is small , then until the number of documents and/or updates gets truly enormous a single machine will probably be fine. But, if your documents (the amount stored per document) is very large, then at some point the index files get so large that performance on a single machine isn't adequate. Alternatively, if your update load is very very large, you might need to spread out that load among multiple servers to handle the update load without crippling your ability to respond to queries. As for a specific instance, we have a single index of 7 Million (going on 28 Million), with maybe 512 bytes of data stored for each document, with maybe 26 or so indexed fields (we have a *lot* of copyField operations in order to index the data the way we want it, yet preserve the original data to return), and did not need to use distributed search. JRJ -----Original Message----- From: Pengkai Qin [mailto:qin19890...@163.com] Sent: Thursday, September 29, 2011 5:15 AM To: solr-user@lucene.apache.org; d...@lucene.apache.org Subject: About solr distributed search Hi all, Now I'm doing research on solr distributed search, and it is said documents more than one million is reasonable to use distributed search. So I want to know, does anyone have the test result(Such as time cost) of using single index and distributed search of more than one million data? I need the test result very urgent, thanks in advance! Best Regards, Pengkai