Hi Kjetil, is this custom component (which performes groub by + calcs stats) somewhere available? I would like to do something similar. Would you mind to share if it isn't already available?
The grouping stuff sounds similar to https://issues.apache.org/jira/browse/SOLR-236 where you can have mem problems too ;-) or see: https://issues.apache.org/jira/browse/SOLR-1682 > Any tips or similar experiences? you want to decrease memory usage? Regards, Peter. > Hi all, > > > we're currently using Solr 1.4.0 in a project for statistical data, where we > group and sum a number of "double" values. Probably not what most people use > Solr for, but it seems to be working fine for us :-) > > > We do have some challenges, especially with memory use, so I thought I'd > check here if anybody has done something similar. > > > Some details: > > > - The index is currently around 30 GB and growing. The data is indexed > directly from a database, each row ends up as a document. I think we have > around 100 million documents now, the largest core is about 40 million. The > data is split in different cores for different statistics data. > > > - Heap size is currently 4 GB. We're currently running all the cores in a > single JVM on WebSphere (WAS) 6.1. We have a couple of GB left for OS disk > cache. Initially we used a 1 GB heap, so we had to split cores in different > shards in order to avoid OutOfMemoryErrors because of the FieldCache (I > think). > > > - The grouping is done by a custom Solr component which takes parameters > that specify which fields to group by (like in SQL) and sums up values for > the group. This uses the FieldCache for speedy retrieval. We did a PoC on > using Documents instead, but this seemed to go a lot slower. I've done a > memory dump and the combined FieldCache looks to be about 3 GB (taken with a > grain of salt since I'm not sure all the data was cached). > > > I guess this is different from normal Solr searches since we have to process > all the documents in a core in order to calculate results, we can't just > return the first 10 (or whatever) documents. > > > Any tips or similar experiences? > > > > ---Kjetil