On 3/19/2014 4:55 AM, Colin R wrote:
My question is an architecture one.
These photos are currently indexed and searched in three ways.
1: The 14M pictures from above are split into a few hundred indexes that
feed a single website. This means index sizes of between 100 and 500,000
entries each.
2: 95% of these same photos are also wanted for searching on a global site.
Index size of 12M plus.
3: 80% of these same photos are also required for smaller group sites. Index
sizes of between 400K and 4M.
We currently make changes the single indexes and then merge into groups and
global. Due to the size of the numbers, is it worth changing or not.
Is it quicker/better to just have one big 14M index and filter the
complexities for each website or is it better to still maintain hundreds of
indexes so we are searching smaller one. Bear in mind, we get thousands of
changes a day PLUS very busy search servers.
My primary use for Solr is an archive of 92 million documents, most of
which are photos. We have thousands of new photos every day. I haven't
been cleared to mention what company it's for.
This screenshot of my status servlet page answers tons of questions
about my index, but if you have additional questions, ask:
https://www.dropbox.com/s/6p1puq1gq3j8nln/solr-status-servlet.png
Here are some details about each host that you cannot see in the
screenshot: 6 SATA disks in RAID10 with 3TB of usable space. 64GB of
RAM. Dual quad-core Intel E54xx series CPUs.Chain A is running Solr
4.2.1 on Java 6, chain B is running Solr 4.6.1 on Java 7, with some
additional plugin software that increases the index size. There is one
Solr process per host, with a 6GB heap.
As long as you index fields that can be used to filter searches
according to what a user is allowed to see, I don't see any problem with
putting all of your data into one index.The main thing you'll want to be
sure of is that you have enough RAM to effectively cache your index.
Because you have SSD, you probably don't need to have enough RAM to
cache ALL of the index data, but it wouldn't hurt. With 36GB of RAM per
machine, you will probably have enough.
Thanks,
Shawn