Re: Solr for statistical data

Peter Karich Thu, 16 Sep 2010 02:48:36 -0700

Hi Kjetil,

is this custom component (which performes groub by + calcs stats)
somewhere available?
I would like to do something similar. Would you mind to share if it
isn't already available?


The grouping stuff sounds similar to
https://issues.apache.org/jira/browse/SOLR-236

where you can have mem problems too ;-) or see:
https://issues.apache.org/jira/browse/SOLR-1682

> Any tips or similar experiences?

you want to decrease memory usage?

Regards,
Peter.


> Hi all,
>
>
> we're currently using Solr 1.4.0 in a project for statistical data, where we
> group and sum a number of "double" values. Probably not what most people use
> Solr for, but it seems to be working fine for us :-)
>
>
> We do have some challenges, especially with memory use, so I thought I'd
> check here if anybody has done something similar.
>
>
> Some details:
>
>
> - The index is currently around 30 GB and growing. The data is indexed
> directly from a database, each row ends up as a document. I think we have
> around 100 million documents now, the largest core is about 40 million. The
> data is split in different cores for different statistics data.
>
>
> - Heap size is currently 4 GB. We're currently running all the cores in a
> single JVM on WebSphere (WAS) 6.1. We have a couple of GB left for OS disk
> cache. Initially we used a 1 GB heap, so we had to split cores in different
> shards in order to avoid OutOfMemoryErrors because of the FieldCache (I
> think).
>
>
> - The grouping is done by a custom Solr component which takes parameters
> that specify which fields to group by (like in SQL) and sums up values for
> the group. This uses the FieldCache for speedy retrieval. We did a PoC on
> using Documents instead, but this seemed to go a lot slower. I've done a
> memory dump and the combined FieldCache looks to be about 3 GB (taken with a
> grain of salt since I'm not sure all the data was cached).
>
>
> I guess this is different from normal Solr searches since we have to process
> all the documents in a core in order to calculate results, we can't just
> return the first 10 (or whatever) documents.
>
>
> Any tips or similar experiences?
>
>
>
> ---Kjetil

Re: Solr for statistical data

Reply via email to