On Wed, 2014-03-19 at 11:55 +0100, Colin R wrote: > We run a central database of 14M (and growing) photos with dates, captions, > keywords, etc. > > We currently upgrading from old Lucene Servers to latest Solr running with a > couple of dedicated servers (6 core, 36GB, 500SSD). Planning on using Solr > Cloud.
What hardware are your past experiences based on? If they have less cores, lower memory and spinning drives, I foresee that your question can be reduced to which architecture you prefer from a logistic point of view, rather than performance. > We take in thousands of changes each day (big and small) so indexing may be > a bigger problem than searching. Thousands of updates in a day is a very low number. Do you have hard requirements for update time, perform heavy faceting or do anything special for this to be a cause of concern? > Is it quicker/better to just have one big 14M index and filter the > complexities for each website or is it better to still maintain hundreds of > indexes so we are searching smaller one. All else being equal, a search in a specific small index will be faster than filtering on the large one. But as we know, all else is never equal. A 14M document index in itself is not really a challenge for Lucene/Solr, but this depends a lot on your specific setup. How large is the 14M index in terms of bytes? > Bear in mind, we get thousands of changes a day PLUS very busy search servers. How many queries/second are we talking about here? What is a typical query (faceting, grouping, special processing...)? Regards, Toke Eskildsen, State and University Library, Denmark