Re: best way for sum of fields

Tanguy Moal Mon, 07 Nov 2011 02:53:29 -0800

Hi,

If you only need to sum over "displayed" results, go with thepost-processing of hits solution, that's fast and easy.If you sum over the whole data set (i.e your sum is not querydependant), have it computed at indexing time, depending on yourindexing workflow.

Otherwise, (sum over the whole result set, query dependant butindependantly of displayed results) you should give a try to sharding...You generally want that when your index size is too large to be searchedquickly (see http://wiki.apache.org/solr/DistributedSearch) (here thesum operation is part of a search query)


Basically what you need is:
- On the master host : n master instances (each being a shard)

- On slave host : n slave instances (each being a replica of its masterside counterpart)

Only the slave instances will need a comfortable amount of RAM in orderto serve queries rapidly. Slave instances can be deployed over severalhosts if the total amount of RAM required is high.


Your main effort here might be in finding the 'n' value.

You have 45M documents in a single shard and that may be the cause ofyour issue, especially for queries returning a high number of results.

You may need to split it into more shards to achieve your goal.

This should enable you to reduce the time to perform the sum operationat search time (but adds complixity at data indexing time : you need todefine a way to send documents to shard #1, #2, ..., or #n).If you keep having more and more documents over time, may be you'll wantto have a fixed maximum shard size (say 5M docs, if performing the sumon 5M docs is fast enough) and simply add shards as required, when moredocuments are to be indexed/searched. This addresses the importing issuebecause you'll simply need to change the target shard every 5M documents.

The last shard is always the smallest.

Such sharding can involve a little overhead at search time : make sureyou don't allow for retrieval of far documents (start=k, where k is high-- seehttp://wiki.apache.org/solr/DistributedSearch#Distributed_Searching_Limitations).-> When using stats component, have start and rows parameters set to 0if you don't need the documents themselves.

After that, if you face high search load issues, you could stillduplicate the slave host to match your load requirements, andload-balance your search traffic over slaves as required.


Hope this helps,

Tanguy

Le 07/11/2011 09:49, stockii a écrit :

sry.

i need the sum of values of the found documents. e.g. the total amount of
one day. each doc in index has ist own amount.

i try out something with StatsComponent but with  48 Million docs in Index
its to slow.

-----
------------------------------- System ----------------------------------------

One Server, 12 GB RAM, 2 Solr Instances, 8 Cores,
1 Core with 45 Million Documents other Cores<  200.000

- Solr1 for Search-Requests - commit every Minute  - 5GB Xmx
- Solr2 for Update-Request  - delta every Minute - 4GB Xmx
--
View this message in context: 
http://lucene.472066.n3.nabble.com/best-way-for-sum-of-fields-tp3477517p3486406.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: best way for sum of fields

Reply via email to