On Mon, 2015-11-02 at 17:27 +0530, Modassar Ather wrote:

> The query q=network se* is quick enough in our system too. It takes
> around 3-4 seconds for around 8 million records.
> 
> The problem is with the same query as phrase. q="network se*".

I misunderstood your query then. I tried replicating it with
q="der se*"

http://rosalind:52300/solr/collection1/select?q=%22der+se*%
22&wt=json&indent=true&facet=false&group=true&group.field=domain

gets expanded to

parsedquery": "(+DisjunctionMaxQuery((content_text:\"kan svane\" |
author:kan svane* | text:\"kan svane\" | title:\"kan svane\" | url:kan
svane* | description:\"kan svane\")) ())/no_coord"

The result was 1,043,258,271 hits in 15,211 ms


Interestingly enough, a search for 
q="kan svane*"
resulted in 711 hits in 12,470 ms. Maybe because 'kan' alone matches 1
billion+ documents. On that note,
q=se*
resulted in -951812427 hits in 194,276 ms.

Now this is interesting. The negative number seems to be caused by
grouping, but I finally got the response time up in the minutes. Still
no memory problems though. Hits without grouping were 3,343,154,869.

For comparison,
q=http
resulted in -1527418054 hits in 87,464 ms. Without grouping the hit
count was 7,062,516,538. Twice the hits of 'se*' in half the time.

> I changed my SolrCloud setup from 12 shard to 8 shard and given each
> shard 30 GB of RAM on the same machine with same index size
> (re-indexed) but could not see the significant improvement for the
> query given.

Strange. I would have expected the extra free memory for disk space to
help performance.

> Also can you please share your experiences with respect to RAM, GC,
> solr cache setup etc as it seems by your comment that the SolrCloud
> environment you have is kind of similar to the one I work on?
> 
There is a short write up at
https://sbdevel.wordpress.com/net-archive-search/

- Toke Eskildsen, State and University Library, Denmark



Reply via email to