Re: Partition Question

Yuval Dotan Tue, 08 May 2012 00:03:33 -0700

Hi
Can someone please guide me to the right way to partition the solr index?


On Mon, May 7, 2012 at 11:41 AM, Yuval Dotan <yuvaldo...@gmail.com> wrote:

> Hi All
> Jan, thanks for the reply - answers for your questions are located below
> Please update me if you have ideas that can solve my problems.
>
> First, some corrections to my previous mail:
>
> > Hi All
> > We have an index of ~2,000,000,000 Documents and the query and facet
> times
> > are too slow for us - our index in fact will be much larger
>
> > Most of our queries will be limited by time, hence we want to partition
> the
> > data by date/time - even when unlimited – which is mostly what will
> happen, we have results in the recent records and querying the whole
> dataset is redundant
>
> > We want to partition the data because the index size is too big and
> doesn't
> > fit into memory (80 Gb's) - our data actually continuously grows over
> time, it will never fit into memory, but has to be available for queries in
> case results are found in older records or a full facet is required
>
> >
> > 1. Is multi core the best way to implement my requirement?
> > 2. I noticed there are some LOAD / UNLOAD actions on a core - should i
> use
> > these action when managing my cores? if so how can i LOAD a core that i
> > have unloaded
> > for example:
> > I have 7 partitions / cores - one for each day of the week - we might
> have 2000 per day
>
> > In most cases I will search for documents only on the last day core.
> > Once every 10000 queries I need documents from all cores.
> > Question: Do I need to unload all of the old cores and then load them on
> > demand (when i see i need data from these cores)?
> > 3. If the question to the last answer is no, how do i ensure that only
> > cores that are loaded into memory are the ones I want?
> >
> > Thanks
> > Yuval
> *
> *
> *Answers to Jan:*
>
> Hi,
>
> First you need to investigate WHY faceting and querying is too slow.
> What exactly do you mean by slow? Can you please tell us more about your
> setup?
>
> * How large documents and how many fields?
> small records ~200bytes, 20 fields avg most of them are not stored -
> attached schema and config file
>
> * What kind of queries? How many hits? How many facets? Have you studies
> &debugQuery=true output?
> problem is not with queries being slow per se, it is with getting 50
> matches out of billions of matching docs
>
> * Do you use filter queries (fq) extensively?
> user generated queries, fq would not reduce the dataset for some of our
> usecases
>
> * What data do you facet on? Many unique values per field? Text or ranges?
> What facet.method?
>  problem is not just faceting, it’s with queries – let’s start there
>
> * What kind of hardware? RAM/CPU
> HP DL180G6 , 2 E5645 (12 core)
> 48 GB RAM
>  * How have you configured your JVM? How much memory? GC?
> java -Xms512M -Xmx40960M -jar start.jar
>
> As you see, you will have to provide a lot more information on your use
> case and setup in order for us to judge correct action to take. You might
> need to adjust your config, or to optimize your queries or caches, slim
> your schema, buy some more RAM, or an SSD :)
>
> Normally, going multi core on one box will not necessarily help in itself,
> as there is overhead in sharding multi cores as well. However, it COULD be
> a solution since you say that most of the time you only need to consider
> 1/7 of your data. I would perhaps consider one "hot" core for last 24h, and
> one "archive" core for older data. You could then tune these differently
> regarding caches etc.
>
> Can you get back with some more details?
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> Solr Training - www.solrtraining.com
>
>

Re: Partition Question

Reply via email to