Any thoughts? Can Solr Cloud support such use case with acceptable performance?
On Thursday, March 20, 2014 7:51 PM, shushuai zhu <ss...@yahoo.com> wrote: Hi, I am looking for some advice to handle large volume of documents with a very high incoming rate. The size of each document is about 0.5 KB and the incoming rate could be more than 20K per second and we want to store about one year's documents in Solr for near real=time searching. The goal is to achieve acceptable indexing and querying performance. We will use techniques like soft commit, dedicated indexing servers, etc. My main question is about how to structure the collection/shard/core to achieve the goals. Since the incoming rate is very high, we do not want the incoming documents to affect the existing older indexes. Some thoughts are to create a latest index to hold the incoming documents (say latest half hour's data, about 36M docs) so queries on older data could be faster since the old indexes are not affected. There seem three ways to grow the time dimension by adding/splitting/creating a new object listed below every half hour: collection shard core Which is the best way to grow the time dimension? Any limitation in that direction? Or there is some better approach? As an example, I am thinking about having 4 nodes with the following configuration to setup a Solr Cloud: Memory: 128 GB Storage: 4 TB How to set the collection/shard/core to deal with the use case? Thanks in advance. Shushuai