RAUNAK AGRAWAL <agrawal.rau...@gmail.com> wrote: > curl http://localhost:8983/solr/collection_name/stream -d > 'expr=facet(collection_name,q="id:953",bucketSorts="week > desc",buckets="week",bucketSizeLimit=200,sum(sales), > sum(amount),sum(days))'
Stats on numeric fields then. > Also in my collection, I have almost 10 Billion documents > with many deletions (close to 40%). Quite a lot of documents and in this case deletions counts, as the internal structures for the deleted documents still needs to be iterated. In scale this looks somewhat like our 18 billion document setup, with the addendum that we use quite large segments (900GB). The performance regressions we encountered with Solr 7 lead to https://issues.apache.org/jira/browse/LUCENE-8374 which helped a lot (performance testing has not finished). If you have or can easily create a test server where your shard(s) is the same size as your production shards, I'd be happy to port the patch to Solr 7.2.1 to see it it helps. I am looking for independent verification, so it is no bother. > I was planning to run optimise to merge the segments but > spoke to admin team and lucidworks guys and they were > against it saying that it will make very large segment file. If your bottleneck is the same as ours, the large segment would mean worse performance (with Solr 7). > Is it true that optimise in solr should not be used, as it comes with other > issues? No simple answer there. If you have an index that you update very rarely, it can save memory and processing power. If you have a live index where you add and delete documents, it will probably be a bad idea. One strategy used with time series data is to have old and immutable data in dedicated collections, which can then be optimized. - Toke Eskildsen