Ok, so here is interesting find. 

As my setup requires frequent (soft) commits cache brings little value.
I tested following on Solr 5.5.0:

q={!cache=false}*:*&
fq={!cache=false}query1 /* not expensive */&
fq={!cache=false cost=200}query2 /* expensive! */&

Only with above set-up (and forcing Solr Post Filtering for expensive
query, hence cost 200) I was able to return to Solr 4.10.3 performance.

By Solr 4 performance I mean:
- not only Solr 4 response times (roughly) for queries returning values,
but also
- very fast response for queries that have 0 results 

I wonder what could be the underlying cause.

Thanks,
Jarek

On Wed, 27 Apr 2016, at 09:13, Jaroslaw Rozanski wrote:
> Hi Eric,
> 
> Measuring running queries via JMeter. Values provided are rounded median
> of multiple samples. Medians are just slightly better than 99th
> percentile for all samples. 
> 
> Filter cache is useless as you mentioned; they are effectively not used.
> There is auto-warming through cache autoWarm but no auto-warming
> queries. 
> 
> Small experiment with passing &NOW=... seems not to make any difference
> which would not be surprising given caches are barely involved.
> 
> Thanks for the suggestion on IO. After stopping indexing, the response
> time barely changed on Solr 5. On Solr 4, with indexing running it is
> still fast. So to effectively, Solr 4 under indexing load is faster than
> idle Solr 5. Both set-ups have same heap size and available RAM on
> machine (2x heap).
> 
> One other thing I am testing is issuing request to specific core, with
> distrib=false. No significant improvements there.
> 
> Now what is interesting is that aforementioned query takes the same
> amount of time to execute despite the number of documents found. 
> - Whether it is 0 or 10k, it takes couple seconds on Solr 5.5.0.
> - Meanwhile, on Solr 4.10.3, the response time is dependent on results
> size. For Solr 4 no results returns in few ms and few seconds for couple
> thousands of results. 
> (query used {!cache=false}q=...)
>   
> 
> Thanks,
> Jarek
> 
> On Wed, 27 Apr 2016, at 04:39, Erick Erickson wrote:
> > Well, the first question is always "how are you measuring this"?
> > Measuring a few queries is almost completely uninformative,
> > especially if the two systems have differing warmups. The only
> > meaningful measurements are when throwing away the first bunch
> > of queries then measuring a meaningful sample.
> > 
> > The setup you describe will be very sensitive to disk access
> > with the autowarm of 1 second, so if there's much at all in
> > the way of differences in I/O that would be a red flag.
> > 
> > From here on down doesn't really respond to the question, but
> > I thought I'd mention it.
> > 
> > And you don't have to worry about disabling your fitlerCache since
> > any filter query of the form fq=field:[mention NOW in here without
> > rounding]
> > never be re-used. So you might as well use {!cache=false}. Here's the
> > background:
> > 
> > https://lucidworks.com/blog/2012/02/23/date-math-now-and-filter-queries/
> > 
> > And your soft commit is probably throwing out all the filter caches
> > anyway.
> > 
> > I doubt you're doing any autowarming at all given the autocommit interval
> > of 1 second and continuously updating documents and your reported
> > query times. So you can pretty much forget what I said about throwing
> > out your first N queries since you're (probably) not getting any benefit
> > out of caches anyway.
> > 
> > On Tue, Apr 26, 2016 at 10:34 AM, Jaroslaw Rozanski
> > <s...@jarekrozanski.com> wrote:
> > > Hi all,
> > >
> > > I am migrating a large Solr Cloud cluster from Solr 4.10 to Solr 5.5.0
> > > and I observed big difference in query execution time.
> > >
> > > First a setup summary:
> > > - multiple collections - 6
> > > - each has multiple shards - 6
> > > - same/similar hardware
> > > - indexing tens of messages per second
> > > - autoSoftCommit with 1s; hard commit few tens of seconds
> > > - Java 8
> > >
> > > The query has following form: field1:[* TO NOW-14DAYS] OR (-field1:[* TO
> > > *] AND field2:[* TO NOW-14DAYS])
> > >
> > > The fields field1 & field2 are of date type:
> > > <fieldType name="date" class="solr.TrieDateField" precisionStep="0"
> > > positionIncrementGap="0"/>
> > >
> > > As query (q={!cache=false}...)
> > > Solr 4.10 -> 5s
> > > Solr 5.5.0 -> 12s
> > >
> > > As filter query (q={!cache=false}*:*&fq=..,)
> > > Solr 4.10 -> 9s
> > > Solr 5.5.0 -> 11s
> > >
> > > The query itself is bad and its optimization aside, I am wondering if
> > > there is anything in Lucene/Solr that would have such an impact on query
> > > execution time between versions.
> > >
> > > Originally I though it might be related to
> > > https://issues.apache.org/jira/browse/SOLR-8251 and testing on small
> > > scale proved that there is a difference in performance. However upgraded
> > > version is already 5.5.0.
> > >
> > >
> > >
> > > Thanks,
> > > Jarek
> > >

Reply via email to