Re: Solr subset searching in 100-million document index

Aloke Ghoshal Fri, 25 Oct 2013 05:17:59 -0700

Hi Sandeep,

You are quite likely below capacity with this current set-up:
http://wiki.apache.org/solr/SolrPerformanceProblems#OS_Disk_Cache


Few things for you to confirm:
1. Which version of Solr are you using?
2. The size of your index.
- Are fields stored? How much are these stored fields contributing to the
overall index size (File types:
http://lucene.apache.org/core/2_9_4/fileformats.html#file-names).
- You are not bloating the index further with term vectors, norms, ngrams,
reverse wild card, etc.
3. Response time (Solr & client side) with your typical queries. Also
utilization numbers for memory, CPU.

For your modelling, if possible, you could consider grouping the regions,
and searching via one regions-group-id in place of 250+ region ids (in an
OR query, not in an "IN param").

Regards,
Aloke



On Thu, Oct 24, 2013 at 8:25 PM, Joel Bernstein <joels...@gmail.com> wrote:

> Sandeep,
>
> This type of operation can often be expressed as a PostFilter very
> efficiently. This is particularly true if the region id's are integer keys.
>
> Joel
>
> On Thu, Oct 24, 2013 at 7:46 AM, Sandeep Gupta <sandy....@gmail.com>
> wrote:
>
> > Hi,
> >
> > We have a Solr index of around 100 million documents with each document
> > being given a region id growing at a rate of about 10 million documents
> per
> > month - the average document size being aronud 10KB of pure text. The
> total
> > number of region ids are themselves in the range of 2.5 million.
> >
> > We want to search for a query with a given list of region ids. The number
> > of region ids in this list is usually around 250-300 (most of the time),
> > but can be upto 500, with a maximum cap of around 2000 ids in one
> request.
> >
> >
> > What is the best way to model such queries besides using an IN param in
> the
> > query, or using a Filter FQ in the query? Are there any other faster
> > methods available?
> >
> >
> > If it may help, the index is on a VM with 4 virtual-cores and has
> currently
> > 4GB of Java memory allocated out of 16GB in the machine. The number of
> > queries do not exceed more than 1 per minute for now. If needed, we can
> > throw more hardware to the index - but the index will still be only on a
> > single machine for atleast 6 months.
> >
> > Regards,
> > Sandeep Gupta
> >
>
>
>
> --
>

Re: Solr subset searching in 100-million document index

Reply via email to