Hi,

If you only need to sum over "displayed" results, go with the post-processing of hits solution, that's fast and easy. If you sum over the whole data set (i.e your sum is not query dependant), have it computed at indexing time, depending on your indexing workflow.
Otherwise, (sum over the whole result set, query dependant but 
independantly of displayed results) you should give a try to sharding...
You generally want that when your index size is too large to be searched 
quickly (see http://wiki.apache.org/solr/DistributedSearch) (here the 
sum operation is part of a search query)
Basically what you need is:
- On the master host : n master instances (each being a shard)
- On slave host : n slave instances (each being a replica of its master side counterpart)
Only the slave instances will need a comfortable amount of RAM in order 
to serve queries rapidly. Slave instances can be deployed over several 
hosts if the total amount of RAM required is high.
Your main effort here might be in finding the 'n' value.
You have 45M documents in a single shard and that may be the cause of your issue, especially for queries returning a high number of results.
You may need to split it into more shards to achieve your goal.

This should enable you to reduce the time to perform the sum operation at search time (but adds complixity at data indexing time : you need to define a way to send documents to shard #1, #2, ..., or #n). If you keep having more and more documents over time, may be you'll want to have a fixed maximum shard size (say 5M docs, if performing the sum on 5M docs is fast enough) and simply add shards as required, when more documents are to be indexed/searched. This addresses the importing issue because you'll simply need to change the target shard every 5M documents.
The last shard is always the smallest.

Such sharding can involve a little overhead at search time : make sure you don't allow for retrieval of far documents (start=k, where k is high -- see http://wiki.apache.org/solr/DistributedSearch#Distributed_Searching_Limitations). -> When using stats component, have start and rows parameters set to 0 if you don't need the documents themselves.

After that, if you face high search load issues, you could still duplicate the slave host to match your load requirements, and load-balance your search traffic over slaves as required.
Hope this helps,

Tanguy

Le 07/11/2011 09:49, stockii a écrit :
sry.

i need the sum of values of the found documents. e.g. the total amount of
one day. each doc in index has ist own amount.

i try out something with StatsComponent but with  48 Million docs in Index
its to slow.

-----
------------------------------- System ----------------------------------------

One Server, 12 GB RAM, 2 Solr Instances, 8 Cores,
1 Core with 45 Million Documents other Cores<  200.000

- Solr1 for Search-Requests - commit every Minute  - 5GB Xmx
- Solr2 for Update-Request  - delta every Minute - 4GB Xmx
--
View this message in context: 
http://lucene.472066.n3.nabble.com/best-way-for-sum-of-fields-tp3477517p3486406.html
Sent from the Solr - User mailing list archive at Nabble.com.


Reply via email to