Re: Very high memory and CPU utilization.

Modassar Ather Mon, 02 Nov 2015 21:49:17 -0800

Thanks Walter for your response,

It is around 90GB of index (around 8 million documents) on one shard and
there are 12 such shards. As per my understanding the sharding is required
for this case. Please help me understand if it is not required.


We have requirements where we need full wild card support to be provided to
our users.
I will try using EdgeNgramFilter. Can you please help me understand if
EdgeNgramFilter can be a replacement of wild cards?
There are situations where the words may be extended with some special
characters e.g. For se* there can be a match secondry-school which also
needs to be considered.

Regards,
Modassar



On Mon, Nov 2, 2015 at 10:17 PM, Walter Underwood <wun...@wunderwood.org>
wrote:

> To back up a bit, how many documents are in this 90GB index? You might not
> need to shard at all.
>
> Why are you sending a query with a trailing wildcard? Are you matching the
> prefix of words, for query completion? If so, look at the suggester, which
> is designed to solve exactly that. Or you can use the EdgeNgramFilter to
> index prefixes. That will make your index larger, but prefix searches will
> be very fast.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
> > On Nov 2, 2015, at 5:17 AM, Toke Eskildsen <t...@statsbiblioteket.dk>
> wrote:
> >
> > On Mon, 2015-11-02 at 17:27 +0530, Modassar Ather wrote:
> >
> >> The query q=network se* is quick enough in our system too. It takes
> >> around 3-4 seconds for around 8 million records.
> >>
> >> The problem is with the same query as phrase. q="network se*".
> >
> > I misunderstood your query then. I tried replicating it with
> > q="der se*"
> >
> > http://rosalind:52300/solr/collection1/select?q=%22der+se*%
> > 22&wt=json&indent=true&facet=false&group=true&group.field=domain
> >
> > gets expanded to
> >
> > parsedquery": "(+DisjunctionMaxQuery((content_text:\"kan svane\" |
> > author:kan svane* | text:\"kan svane\" | title:\"kan svane\" | url:kan
> > svane* | description:\"kan svane\")) ())/no_coord"
> >
> > The result was 1,043,258,271 hits in 15,211 ms
> >
> >
> > Interestingly enough, a search for
> > q="kan svane*"
> > resulted in 711 hits in 12,470 ms. Maybe because 'kan' alone matches 1
> > billion+ documents. On that note,
> > q=se*
> > resulted in -951812427 hits in 194,276 ms.
> >
> > Now this is interesting. The negative number seems to be caused by
> > grouping, but I finally got the response time up in the minutes. Still
> > no memory problems though. Hits without grouping were 3,343,154,869.
> >
> > For comparison,
> > q=http
> > resulted in -1527418054 hits in 87,464 ms. Without grouping the hit
> > count was 7,062,516,538. Twice the hits of 'se*' in half the time.
> >
> >> I changed my SolrCloud setup from 12 shard to 8 shard and given each
> >> shard 30 GB of RAM on the same machine with same index size
> >> (re-indexed) but could not see the significant improvement for the
> >> query given.
> >
> > Strange. I would have expected the extra free memory for disk space to
> > help performance.
> >
> >> Also can you please share your experiences with respect to RAM, GC,
> >> solr cache setup etc as it seems by your comment that the SolrCloud
> >> environment you have is kind of similar to the one I work on?
> >>
> > There is a short write up at
> > https://sbdevel.wordpress.com/net-archive-search/
> >
> > - Toke Eskildsen, State and University Library, Denmark
> >
> >
> >
>
>

Re: Very high memory and CPU utilization.

Reply via email to